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Peptide information for frame 1 



ORF from 202 bp to 876 bp; peptide length: 225 
Category: similarity to known protein 



1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 
51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 
101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRIDNMMNII 
151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 
201 SLSVSQPCET DSSSPQVSWK RGIEE 

BLASTP hits 

Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL :HSSDS22MR_1 gene: "sds22"; 

product: "yeast sds22 homolog"; H . sapiens sds22-like mRNA 

Score = 234, P = 1.2e-19, identities = 61/143, positives = 93/143 

Entry A38439 from database PIR: 

suppressor protein sds22(+) - fission yeast (Schizosaccharomyces pombe) 
>TREMBL:SPSDS22_1 gene: "sds22+*\- S. pombe sds22+ gene, complete cds. 
Score = 208, P - 5.6e-17, identities = 52/127, positives = 71/127 

Entry S43988 from database PIR: 

protein suppressor sds22 - fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PP1 REGULATORY SUBUNIT 
SDS22. >TREMBL:SPAC4A8_12 gene: "sds22"; product: "phosphatases ppl 
regulatory subunit"; S. pombe chromosome I cosmid c4A8. 
Score = 208, P = 8.5e-17, identities « 52/127, positives « 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2. 

Score - 214, P - 3.6e-16, identities « 50/125, positives - 75/125 



Alert BLASTP hits for DKFZphutel_20mll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_20ml 1, frame 1 



Report for DKFZphutel_20mll . 1 



[LENGTH] 225 

[MWJ 25955.87 

[pi) 4.63 

[HOMOL] PIR:S68209 sds22 protein homolog - human le-18 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YKL193c] 2e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKL193c] 2e-ll 

[ FUNCAT J 06.07 protein modification (glycolsylation, acylation, myristylation, 
palmitylation, farnesylation and processing) [S. cerevisiae, YKLl93c) 2e-ll 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YOR373w) 2e-06 

[FUNCAT] 01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

YJLOOSw] 3e-05 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YJLOOSw] 3e-05 

[FUNCAT] 30.02 organization of plasma membrane [S. cerevisiae, YJLOOSw] 3e-05 

[FUNCAT] 10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-05 

[FUNCAT] 04.07 rna transport [S. cerevisiae, YPL169c] 9e-04 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YCR065w] 9e-04 

[EC] 4.6.1.1 Adenylate cyclase 2e-06 

[PIRKW] nucleus 5e-16 

[PIRKW] duplication 2e-06 

[PIRKW] tandem repeat 2e-06 

[PIRKW] cAMP biosynthesis 2e-06 

[PIRKW] glycoprotein 2e-06 

[PIRKW] phosphorus-oxygen lyase 2e-06 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology 5e-16 

[SUPFAM] fibromodulin 3e-07 

[SUPFAM] yeast adenylate cyclase catalytic domain homology 2e-06 

[SUPFAM] yeast adenylate cyclase 2e-06 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 1 
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[KW] All_Alpha 

SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPCEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNIIEKIEGLENLAHLVWLDLSFNNIETIEGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ ISKIDSLDALVKLQVLSLGNNRIDNMMNIIYLRRFKCLRTLSLSRNPISEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRIDDHTASVSLSVSQPCETDSSSPQVSWKRGIEE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 



Prosite for DKFZphutel_20mll . 1 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 
PS00006 122->126 CK2_PHOSPHO_SITE PDOC00006 
PS00006 169->173 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_20mll . 1 ) 
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DKFZphutel_20m24 



group: metabolism 

DKFZphutel_20m24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 



strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 is involved in the assembly of the core oligosaccharide 
Glc3Man9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8 954 corresponding genomic DNA (1 exon ) 

Sequenced by AGOWA 

Locus: /map="ll" 

Insert length: 1986 bp 

Poly A stretch at pos. 1966, polyadenylation signal at pos. 1949 



1 TTCTTTTTTC CCCAGGCTTG CCATGGCTAG TCGAGGGGCT CGGCAGCGCC 
51 TGAAGGGCAG CGGGGCCAGC AGTGGGGATA CGGCCCCGGC TGCGGACAAG 
101 CTGCGGGAGC TGCTGGGCAG CCGAGAGGCG GGCGGCGCGG AGCACCGGAC 
151 CGAGTTATCT GGGAACAAAG CAGGACAAGT CTGGGCACCT GAAGGATCTA 
201 CTGCTTTCAA GTGTCTGCTT TCAGCAAGGT TATGTGCTGC TCTCCTGAGC 
251 AACATCTCTG ACTGTGATGA AACATTCAAC TACTGGGAGC CAACACACTA 
301 CCTCATCTAT GGGGAAGGGT TTCAGACTTG GGAATATTCC CCAGCATATG 
351 CCATTCGCTC CTATGCTTAC CTGTTGCTTC ATGCCTGGCC AGCTGCATTT 
401 CATGCAAGAA TTCTACAAAC TAATAAGATT CTTGTGTTTT ACTTTTTGCG 
451 ATGTCTTCTG GCTTTTGTGA GCTGTATTTG TGAACTTTAC TTTTACAAGG 
501 CTGTGTGCAA GAAGTTTGGG TTGCACGTGA GTCGAATGAT GCTAGCCTTC 
551 TTGGTTCTCA GCACTGGCAT GTTTTGCTCA TCATCAGCAT TCCTTCCTAG 
601 TAGCTTCTGT ATGTACACTA CGTTGATAGC CATGACTGGA TGGTATATGG 
651 ACAAGACTTC CATTGCTGTG CTGGGAGTAG CAGCTGGGGC TATCTTAGGC 
701 TGGCCATTCA GTGCAGCTCT TGGTTTACCC ATTGCCTTTG ATTTGCTGGT 
751 CATGAAACAC AGGTGGAAGA GTTTCTTTCA TTGGTCGCTG ATGGCCCTCA 
801 TACTATTTCT GGTGCCTGTG GTGGTCATTG ACAGCTACTA TTATGGGAAG 
851 TTGGTGATTG CACCACTCAA CATTGTTTTG TATAATGTCT TTACTCCTCA 
901 TGGACCTGAT CTTTATGGTA CAGAACCCTG GTATTTCTAT TTAATTAATG 
951 GATTTCTGAA TTTCAATGTA GCCTTTGCTT TGGCTCTCCT AGTCCTACCA 
1001 CTGACTTCTC TTATGGAATA CCTGCTGCAG AGATTTCATG TTCAGAATTT 
1051 AGGCCACCCG TATTGGCTTA CCTTGGCTCC AATGTATATT TGGTTTATAA 
1101 TTTTCTTCAT CCAGCCTCAC AAAGAGGAGA GATTTCTTTT CCCTGTGTAT 
1151 CCACTTATAT GTCTCTGTGG CGCTGTGGCT CTCTCTGCAC TTCAGAAATG 
1201 TTACCACTTT GTGTTTCAAC GATATCGCCT GGAGCACTAT ACTGTGACAT 
1251 CGAATTGGCT GGCATTAGGA ACTGTCTTCC TGTTTGGGCT CTTGTCATTT 
1301 TCTCGCTCTG TGGCACTGTT CAGAGGATAT CACGGGCCCC TTGATTTGTA 
1351 TCCAGAATTT TACCGAATTG CTACAGACCC AACCATCCAC ACTGTCCCAG 
1401 AAGGCAGACC TGTGAATGTC TGTGTGGGAA AAGAGTGGTA TCGATTTCCC 
1451 AGCAGCTTCC TTCTTCCTGA CAATTGGCAG CTTCAGTTCA TTCCATCAGA 
1501 GTTCAGAGGT CAGTTACCAA AACCTTTTGC AGAAGGACCT CTGGCCACCC 
1551 GGATTGTTCC TACTGACATG AATGACCAGA ATCTAGAAGA GCCATCCAGA 
1601 TATATTGATA TCAGTAAATG CCATTATTTA GTGGATTTGG ACACCATGAG 
1651 AGAAACACCC CGGGAGCCAA AATATTCATC CAATAAAGAA GAATGGATCA 
1701 GCTTGGCCTA TAGACCATTC CTTGATGCTT CTAGATCTTC AAAGCTGCTG 
1751 CGGGCATTCT ATGTCCCCTT CCTGTCAGAT CAGTATACAG TGTACGTAAA 
1801 CTACACCATC CTCAAACCCC GGAAAGCAAA GCAAATCAGG AAGAAAAGTG 
1851 GAGGTTAGCA ACACACCTGT GGCCCCAAAG GACAACCATC TTGTTAACTA 
1901 TTGATTCCAG TGACCTGACT CCCTGCAAGT CATCGCCTGT AACATTTGTA 
1951 ATAAAGGTCT TCTGACATGA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJ159ol, complete sequence. 
Length = 42,771 

Entry HSB8954 from database EMBL: 



503 



WO 01/12659 



PCT/IB00/01496 



cSRL-50A3-u cSRL flow sorted Chromosome 11 specific cosmid Homo 
sapiens genomic clone CSRL-50A3. 
Length = 601 



Medline entries 



96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharomyces cerevisiae: 
identification of the ALG9 gene encoding a putative 
mannosyl transferase. 



Peptide information for frame 2 



ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 



1 MASRGARQRL KGSGASSGDT APAADKLREL LGSREAGGAE HRTELSGNKA 
51 GQVWAPEGST AFKCLLSARL CAALLSNISD CDETFNYWEP THYLIYGEGF 
101 QTWEYSPAYA IRSYAYLLLH AWPAAFHARI LQTNKILVFY FLRCLLAFVS 
151 CICELYFYKA VCKKFGLHVS RMMLAFLVLS TGMFCSSSAF LPSSFCMYTT 
201 LIAMTGWYMD KTSIAVLGVA AGAILGWPFS AALGLPIAFD LLVMKHRWKS 
251 FFHWSLMALI LFLVPVVVID SYYYGKLVIA PLNIVLYNVF TPHGPDLYGT 
301 EPWYFYLING FLNFNVAFAL ALLVLPLTSL MEYLLQRFHV QNLGHPYWLT 
351 LAPMYIWFII FFIQPHKEER FLFPVYPLIC LCGAVALSAL QKCYHFVFQR 
401 YRLEHYTVTS NWLALGTVFL FGLLSFSRSV ALFRGYHGPL DLYPEFYRIA 
4 51 TDPTIHTVPE GRPVNVCVGK EWYRFPSSFL LPDNWQLQFI PSEFRGQLPK 
501 PFAEGPLATR IVPTDMNDQN LEEPSRYIDI SKCHYLVDLD TMRETPREPK 
551 YSSNKEEWIS LAYRPFLDAS RSSKLLRAFY VPFLSDQYTV YVNYTILKPR 
601 KAKQIRKKSG G 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20m24 , frame 2 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score = 957, P « 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-51 

SWISSPROT: YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N - 1, Score = 957, P = 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P => 2.3e-51 

>SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II. 

Length = 653 

HSPs: 

Score = 957 (143.6 bits), Expect = 2.7e-96, P = 2.7e-96 
Identities = 206/514 (40%), Positives = 296/514 (57%) 

GQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSP 
W + FK LLS R+ A+ I + DCDE +NYWEP H +YGEGFQTWEYSP 



YAIRSY Y+ LH PA+ A + KI+VF +R + + E Y + A+-CKK 



+ F + S+GMF +S+AF+PSSFCM T 



228 PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLY 287 



Query: 


48 


Sbjct : 


43 


Query : 


108 


Sbjct: 


103 


Query: 


168 


Sbjct: 


163 


Query: 


228 
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PFSA LGLPI D+L++K F SL+ + V+ DS+Y+GK V+APLNI LY 

Sbjct: 223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

Query: 288 NVFTPHGPDLYGTEPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 347 

NV + GP LYG EP FY+ N F N+N+ A PL+ + Y + + Q+ 
Sbjct: 283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNI VIFAAPFGFPLS--LAYFTKVWMSQDRNVAL 340 

Query: 34 8 WLTLAPMYI WFI I FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 4 00 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 

Sbjct: 341 YQRFAPIILLAVTTAAWLLIFGSQAHKEERFLFPIYPFIAFFAALALDATNR— LCLKK 397 

Query: 401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPE 4 60 

++ N L++ + F +LS SR+ ++ Y +++Y T+ T + 

Sbjct: 398 LGMD NILSILFILCFAILSASRTYSIHNNYGSHVEIYRSLNAELTNRT-NFKNF 450 

Query: 4 61 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFIPSEFRGQLPKPFAEGPL ATRI 511 

P+ VCVGKEW+RFPSSF +P +++FI SEFRG LPKPF + TR 

Sbjct: 451 HDPIRVCVGKEWHRFPSSFFIPQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

Query: 512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
Sbjct: 511 I PTEMNNLNQEEISRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQNY 556 



Pedant information for DKFZphutel_20m24 , frame 2 



Report for DKFZphutel_20m24 . 2 



(LENGTH] 611 

(MWJ 69863.78 

[plj 8.91 

[HOMOLJ SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2e- 
93 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YNL219c] 4e-69 

[FUNCAT] 01.06.01 lipid, fatty-acid and sterol biosynthesis (S. cerevisiae, YNL219c] 
4e-69 

[FUNCAT] 01.05.01 carbohydrate utilization IS. cerevisiae, YNL219cJ 4e-69 

[PIRKW] glycosyltransferase 9e-68 

[PIRKW] transmembrane protein 9e-68 

[PIRKW] hexosyltransferase 9e-68 

[PROSITE] MYRISTYL 9 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE J PKC_PHOSPHO_SITE 6 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] TRANSMEMBRANE 7 

[KW] LOW_COMPLEXITY 6.71 % 



SEQ MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 

SEG 

PRD ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccccccch 

MEM MMMMMM 

SEQ AFKCLLSARLCAALLSNISDCDETFNYWEPTHYLI YGEGFQTWEYSPAYAIRSYAYLLLH 

SEG . . . xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhccccceeeccccceeeeeccccceeecccchhhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ AWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGLHVSRMMLAFLVLS 

SEG 

PRD cchhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGWPFSAALGLPIAFD 

SEG xxxxxxxxxxxxx 

PRD cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 

MEM MMMMMMMMMMMMMM 

SEQ LLVMKHRWKSFFHWSLMALI LFLVPVVVI DSYYYGKLVI APLNI VLYNVFTPHGPDLYGT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 

MEM MMMMMMM . MMMMMMMMMMMMMMMMMMMMM 

SEQ EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 

SEG xxxxxxxxxxxxxxx 

PRD cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMKMMMMMMMMMM 
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SEO 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWLALGTVFL 

hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 
MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

FGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 
ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 
cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

KAKQIRKKSGG 
hhhhhhccccc 



Prosite for DKFZphutel_20m24 . 2 



PS00001 
PS00O01 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00O06 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0008 
PS00OO8 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 



77->81 
593->597 
606->610 
67->70 
133->136 
541->544 
545->548 
553->556 
572->575 
16->20 
79->83 
329->333 
457->461 
541->545 
545->549 
553->557 
12->18 
14->20 
32->38 
47->53 
166->172 
182->188 
218->224 
222->228 
234->240 



ASN_GLYCOS YLAT ION 

ASN_GLYCOSYLATION 

CAMP_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_20m24 . 2 ) 
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DKFZphutel_21dl5 
group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /chromosome="3" 
Insert length: 5292 bp 

Poly A stretch at pos. 5273, polyadenylation signal at pos. 5252 

1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 
51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 

101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 

151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 

201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 

251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 

301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 

351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 

401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 

451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 

501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 

551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 

601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 

651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 

701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 

751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 

801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 

851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 

901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 

951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 
1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 
1051 CAGCCAGATC CTGCCTACTG AAGAGTATGA AGAGGCAATG AGCACTATGC 
1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 
1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 
1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 
1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTCC TATCCCCGTG 
1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 
1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 
1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 
1451 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 
1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 
1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 
1601 GAGGCCAGCT AGAGCTTGGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 
1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 
1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 
1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 
1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 
1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 
1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 
1951 GTGAAGAATC TGGCCGTGAC CTGGCTGCCA CACTGCTATA GGCCCCAGAA 
2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 
2051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 
2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 
2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 
2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 
2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 
2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 
2351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 
2401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 
2451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 
2501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 
2551 GCTCTGGGTC TTCCTGTAAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 
2601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 
2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 
2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 
2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 
2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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2851 AACAGCCACC ATACCTGGCT CTACCAGGGT GAGGGTGCCC ACCACATCAT 
2901 GCGTGCCATC CGCCAGAGGT GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
2951 GGGGAGAAGA CTGGGCAGGG CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
3001 AGGACAGAAT GGATTAACCC ATTTGGGATT AAGTTCCATT TGTTAGACCA 
3051 GGATTGGGAC CCACTGAAAG ACAGGCAATT AACAAAGGCA AATTAGCCCT 
3101 CCTTGCAGGC ACACAATGGG CAACTGGGGT TAGATAGAGA TTGAGCACTT 
3151 CTTTCTGATT AGATAAATGA CCTCTTATCT TTGACCCCTT ATCTGACCCC 
3201 GTCACAGCAG GAAAAGGGTT TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
3251 GGACCTCAGG ACTCCCCGCC CCCTTTATTT AGTGGAAATG TCAACATTTC 
3301 CACATAGCAG GTGTCTCTGT CTTTGGCATC TGAGGGAGAA GGATCATCAT 
3351 GAGTAACCCC CTCCTGCTCT TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
3401 CTTCCAGGGG AGGTGGGTAG GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
3451 CCTTGGCCAG CTCCTTCAGA TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
3501 ATGCCTGCTG CCCACCAGGG TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
3551 TCGTGGAGCT CAGCGAGCCG CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
3601 CACTACCATG CCCACGTGGA CAGTGGGCCT GTGTACCCAG AGACCATCTG 
3651 CTCCCATACC AAGCTGGTAG CCAACGAGTC TGTACCCTTC GAGACCTCCT 
3701 GCCGGCAAGT ATCTCCCAAC TGGGGGCTGC CTTCAATCCT CAGACCAGGA 
3751 ACACCCATGA CACAGGCACA GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
3801 GGGGCCAGGA GATCACTGGG TTATCCCGGT TAGTGATGCC CTCACCTCTC 
3851 CCCACAAGTT GTTTACCCAA TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
3901 TGACCACTGG AGTCAACACA GACTGATGTA CCCACAGACA CCAAAACTTG 
3951 CCCCCTGAGT TCTGAAGCAA GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
4001 CCCATTCCTC CAGGTGTTGA TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
4051 TGCCTCCCTC CCCTGTCAAG CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
4101 GGCCCAGCCC CTTCCCATCC CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
4151 TCTGCTAGCC TACCTTTCCC TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
4201 TAACTAAGTG CACCTGTGAT CTTGGCCAAA AAACCATTGC AACTCACAGT 
4251 AAGAGACTGG GTTTCGGGGA AGGAGGGGCT AGGGACATTT TGGCACTGGC 
4301 CTGCCCTATT GTCTCCCATC CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
4 351 CCTGGGCAGC TTATCCTGCC CACAGGTAAG CCCCTGGGAG CATCCACAAC 
4 401 TGGGGACCTG CTCAGTGCCC CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
4 451 TTTTATTTGA ACAACGTCAC TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
4 501 AGATAACAGA ACCTACGATG AAATGGTAAG GGTCAACTGG GCTATTACTC 
4551 TTGTGGGCTG GCAGGGGCTT AGACAAGTGA AGTACACACC TCTCCAGGTC 
4 601 TAAGGATGTG GGCCCAAATT ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
4 651 TTGGTCACCC TTGGCTGGCC TGGCCATAGA GTGGGGACAG GTTGAACACC 
4701 CCACCACCCT GCTGCCCACA GAGTCTGATT CAGGATGACG TGGACCTCCG 
4751 TGACACACGG AGGCACTGTG ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
4801 AGGGCACAGC AGTCTTCTGG TACAACTACC TGCCTGATGG GCAAGGTTGG 
4851 GTGGGTGACG TAGACGACTA CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
4 901 CGGCACCAAG TGGATTGCCA ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
4 951 CGCGGCAAGC GCTGTTCCAA CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
5001 GGCACCGACT CACAGCCCGA GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
5051 GCGCGTGGAA CTCTGAGGGA AGAGTTAGCC CCGGTTCCCA GCCGCGGGTC 
5101 GCCAGTTGCC CAAGATCAGG GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
5151 CTAAAGGTCT GGCCAATGTC TTGCCCCACC CCGCCAGCCG CGATACGGCG 
5201 CAGTTCCTAT ATTCATGTTA TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
5251 CAAATAAAAA ACCACAAGGT TCGAAAAAAA AAAAAAAAAA GG 



BLAST Results 



Entry HSU64252 from database EMBL: 
Human STS sequence NOTI-225. 

Score - 959, P = 1.2e-36, identities =* 195/199 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LLSASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPQAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 



BLAST P hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 1 
No Alert BLASTP hits found 



Peptide information for frame 2 



ORF from 320 bp to 892 bp; peptide length: 191 
Category: putative protein 
Classification: no clue 

1 MAAAAVTGQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 

51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL FVHYSNGDES SDPGPQHRAQ 

101 GPGPEPTLGP LTRLEGIKVR TSLPRRAPGP ARLSPRGPAL SPGPHAAPGA 

151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N « 2, 
Score = 106, P = 0.0067 



>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1, 298 

HSPs: 



Score - 106 (15.9 bits), Expect ~ 6.7e-03, Sum P(2) - 6.7e-03 
Identities = 36/103 (34%), Positives = 44/103 (42%) 

Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRGPALSPGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

Sbjct: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLSGPGRLGPFWAARSGAPAARCAP 189 

AAP AA ++R P+ GP LG W + P+ AP 

Sbjct: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score = 40 (6.0 bits), Expect - 6.7e-03, Sum P(2) - 6.7e-03 
Identities « 8/21 (38%), Positives = 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 
Sbjct: 212 DHAREARAVGRGPSSAAPAAP 232 

Pedant information for DKFZphutel_21dl5, frame 1 



Report for DKFZphutel_21dl5 . 1 

[LENGTH] 117 

[MW] 11797.32 

(pi) 10.68 

[KW] Irregular 

[KW] SIGNAL_PEPTIDE 22 

[KW] LOW_COMPLEXITY 38.4 6 % 

SEQ LPLVYALMVPLLSASTLGTLASDLESVQLCPTATQLGKRSPSVGWGSRRRKAEPGADAGG 

SEG xxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQHPQAPSPSDRGARGPGGRCPGDCAARAPPRPLPWARARPGCHGGSGGDRPAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphutel_21dl5 . 1) 
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(No Pfam data available for DKFZphutel_21dl5 . 1) 

Pedant information for DKFZphutel_21dl5, frame 2 
Report for DKFZphutel_21dl5 .2 

[LENGTH] 191 

[MW] 19916.88 

tpl] 10.43 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 29.84 % 

SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

SEG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 



MEM 



SEQ FLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . xxxx 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 

MEM 

SEQ GAPAARCAPFP 

SEG xxxxxxxxx.. 

PRD ccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_21dl5 .2) 
(No Pfam data available for DKFZphutel_21dl5.2) 
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DKFZphutel_22d2 



group: signal transduction 

DKFZphutel 22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the ras 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of a 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and YAL048c 



Sequenced by BMFZ 
Locus: /map-" 17" 
Insert length: 3247 bp 

Poly A stretch at pos. 3230, no polyadenylation signal found 



1 CTCCTGGTGA GAGGAGTCCA CTCCGTGCGT GCGGGCGGAG GCCGGCCCCC 
51 GAGAGCCGCC GACATGAAGA AAGACGTGCG GATCCTGCTG GTGGGAGAAC 
101 CTAGAGTTGG GAAGACATCA CTGATTATGT CTCTGGTCAG TGAAGAATTT 
151 CCAGAAGAGG TTCCTCCCCG GGCAGAAGAA ATCACCATTC CAGCTGATGT 
201 CACCCCAGAG AGAGTTCCAA CACACATTGT AGATTACTCA GAAGCAGAAC 
251 AGAGTGATGA ACAACTTCAT CAAGAAATAT CTCAGGCTAA TGTCATCTGT 
301 ATAGTGTATG CCGTTAACAA CAAGCATTCT ATTGATAAGG TAACAAGTCG 
351 ATGGATTCCT CTCATAAATG AAAGAACAGA CAAAGACAGC AGGCTGCCTT 
401 TAATATTGGT TGGGAACAAA TCTGATCTGG TGGAATATAG TAGTATGGAG 
451 ACCATCCTTC CTATTATGAA CCAGTATACA GAAATAGAAA CCTGTGTGGA 
501 GTGTTCAGCG AAAAACCTGA AGAACATATC AGAGCTCTTT TATTACGCAC 
551 AGAAAGCTGT TCTTCATCCT ACAGGGCCCC TGTACTGCCC AGAGGAGAAG 
601 GAGATGAAAC CAGCTTGTAT AAAAGCCCTT ACTCGTATAT TTAAAATATC 
651 TGATCAAGAT AATGATGGTA CTCTCAATGA TGCTGAACTC AACTTCTTTC 
701 AGAGGATTTG TTTCAACACT CCATTAGCTC CTCAAGCTCT GGAGGATGTC 
751 AAGAATGTAG TCAGAAAACA TATAAGTGAT GGTGTGGCTG ACAGTGGGTT 
801 GACCCTGAAA GGTTTTCTCT TTTTACACAC ACTTTTTATC CAGAGAGGGA 
8 51 GACACGAAAC TACTTGGACT GTGCTTCGAC GATTTGGTTA TGATGATGAC 
901 CTGGATTTGA CACCTGAATA TTTGTTCCCC CTGCTGAAAA TACCTCCTGA 
951 TTGCACTACT GAATTAAATC ATCATGCATA TTTATTTCTC CAAAGCACCT 
1001 TTGACAAGCA TGATTTGGAT AGAGACTGTG CTTTGTCACC TGATGAGCTT 
1051 AAAGATTTAT TTAAAGTTTT CCCTTACATA CCTTGGGGGC CAGATGTGAA 
1101 TAACACAGTT TGTACCAATG AAAGAGGCTG GATAACCTAC CAGGGATTCC 
1151 TTTCCCAGTG GACGCTCACG ACTTATTTAG ATGTACAGCG GTGCCTGGAA 
1201 TATTTGGGCT ATCTAGGCTA TTCAATATTG ACTGAGCAAG AGTCTCAAGC 
1251 TTCAGCTGTT ACAGTGACAA GAGATAAAAA GATAGACCTG CAGAAAAAAC 
1301 AAACTCAAAG AAATGTGTTC AGATGTAATG TAATTGGAGT GAAAAACTGT 
1351 GGGAAAAGTG GAGTTCTTCA GGCTCTTCTT GGAAGAAACT TAATGAGGCA 
1401 GAAGAAAATT CGTGAAGATC ATAAATCCTA CTATGCGATT AACACTGTTT 
1451 ATGTATATGG ACAAGAGAAA TACTTGTTGT TGCATGATAT CTCAGAATCG 
1501 GAATTTCTAA CTGAAGCTGA AATCATTTGT GATGTTGTAT GCCTGGTATA 
1551 TGATGTCAGC AATCCCAAAT CCTTTGAATA CTGTGCCAGG ATTTTTAAGC 
1601 AACACTTTAT GGACAGCAGA ATACCTTGCT TAATCGTAGC TGCAAAGTCA 
1651 GACCTGCATG AAGTTAAACA AGAATACAGT ATTTCACCTA CTGATTTCTG 
1701 CAGGAAACAC AAAATGCCTC CACCACAAGC CTTCACTTGC AATACTGCTG 
1751 ATGCCCCCAG TAAGGATATC TTTGTTAAAT TGACAACAAT GGCCATGTAT 
1801 CCGTAAGTAC TTGCTGTCTT CATTTTCATG TTGCATGGTT CATAACATTG 
1851 CATGCCATTA TTAGCCATGA AGGGAATATC TTTGTCACAT AGGAATTGTT 
1901 CAGCAACAGA AAGATACTTT GTAATGAGAA GGTACAAATT TGAGTAAATG 
1951 CAAGTTTGGT TTGAATGCCA TAATAAAATG ATATAAACAG TGCTTCTGAC 
2001 AATATCTGTA TATTTTTGAG CAGGCTGTAA CTATCTTAAT AGAATAGTAC 
2051 AATAAAACAC AACCCCCCAC CCAGCATTAA AAAATAGTTT TACTGGAATA 
2101 AAATGGGTTT GGCATCATGT TGTTTTATGC TTATAAAGCA TTTTCATATG 
2151 AACAGAAAGT TTATATTTTT CTGTTTTTGA CCTTAGGTAT ATGAAGTTTT 
2201 CTAAAATATT TTATTAATTT ATGTTGAAAT TGTGGGTATG CTTCAGTTAG 
2251 GATATGTCTT TTTTAAGTGC TGTAAAGAGT AGTTGTAATT GGAATTTCTA 
2301 CTGTATAAAT GTTTTACATT AAGTGTTACG AGCCACAAAT TTCATGTACA 
2351 TTTATTATAT ATCTATACAT GCATATGCAC AAGCACATAA CTGTGGTCAT 
2401 CTCTGTAGTT TACTAACTGC CTTAAAATTG CATGGTTCTT AATGGCATTC 
2451 GCCTCAAGTA GTGTGTTTGT ATAAATTCTG TTTTGTAACA AAATAGTTTT 
2501 TCAGGCAGTG CGTTTCTCAG GACTTTATAG CTTATTCTAC TTATTCTTAT 
2551 GTTAGTCTCT AAATTATTTT TCTTCTTATG AAAACTACAG TGTAACACAG 
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2601 AGTAATAATC AAACATTGCT ATAAACCAAG AATGACATTT TTCAAAAAGG 

2651 TGTTGATTTG TACAGATTTT TAAAGTCAGT TAACTTTACT GCTATTTTAT 

2701 TACCTAATAC TTTTTTTAGA TGCAACAAAC CCTTGAATTT CTATTTGTAT 

2751 TCGAAGACAA GTCATTCCTA TTATTATAGA ATAACCAAAA CCTTATTTAT 

2801 GTTTTACCTT TGCTTTAAAA CTCTCATGTA TGTTATCTAC AGAGAGGATC 

2851 ATTACAGAGA CAGACTCTCC CGAGACATGG GCCACACTGA TAGAATAGAG 

2901 AATTTGAGAA AAATCTGGGT CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 

2951 TTCTTTATGA CTTCTGTGGG ATTTTGTTGA TATTTTCTTA GAGAATGACC 

3001 AAATCTCCTT TCTTGCCATA ATTAACATTT AGTAATTATG TAGAAACGCA 

3051 CTGCTTGGTC AGGCTTCCTG CCTAGCTATA TATTACGTTG TCTTCCTTAC 

3101 TACATAAATG TACTTCTTTA ATCTTGTGAT TACAGTAACT GCAAGTGTGT 

3151 TTTTACATCT GCATTTTTAA AACATTTTAC TGTAATTCTG TTGTGTGTGT 

3201 GTGTGTTATA TGATAAATGT ACATACATGG AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS *** NFl-related locus, Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities « 387/396 

Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score = 1826, P = 7.5e-78, identities - 388/406 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 64 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 



1 MKKDVRILLV GEPRVGKTSL 

51 VPTHIVDYSE AEQSDEQLHQ 

101 INERTDKDSR LPLILVGNKS 

151 NLKNISELFY YAQKAVLHPT 

201 DGTLNDAELN FFQRICFNTP 

251 FLFLHTLFIQ RGRHETTWTV 

301 LNHHAYLFLQ STFDKHDLDR 

351 TNERGWITYQ GFLSQWTLTT 

401 VTRDKKIDLQ KKQTQRNVFR 

451 EDHKSYYAIN TVYVYGQEKY 

501 PKSFEYCARI FKQHFMDSRI 

551 MPPPQAFTCN TADAPSKDIF 



IMSLVSEEFP EEVPPRAEEI TIPADVTPER 
EISQANVICI VYAVNNKHSI DKVTSRWIPL 
DLVEYSSMET ILPIMNQYTE IETCVECSAK 
GPLYCPEEKE MKPACIKALT RIFKISDQDN 
LAPQALEDVK NVVRKHISDG VADSGLTLKG 
LRRFGYDDDL DLTPEYLFPL LKIPPDCTTE 
DCALSPDELK DLFKVFPYIP WGPDVNNTVC 
YLDVQRCLEY LGYLGYSILT EQESQASAVT 
CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 
LLLHDISESE FLTEAEI ICD VVCLVYDVSN 
PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 
VKLTTMAMYP 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22d2, frame 1 

TREMBL:CEUK08F11_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11., N - 1, Score = 1357, P - l.le-138 

TREMBL:SPCC320_4 gene: "SPCC320 . 04c" ; product: "hypothetical protein" 
S.pombe chromosome III cosmid c320., N - 1, Score = 889, P - 4.4e-89 

TREMBL:CEUC47C12_3 gene: "C47C12.4"; Caenorhabditis elegans cosmid 
C47C12-, N = 2, Score - 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048c - yeast (Saccharomyces 
cerevisiae), N - 1, Score = 677, P » 1.3e-66 



>TREMBL:CEUK08F11_3 gene: "K08F11 . 5" ; Caenorhabditis elegans cosmid 
K08F11. 

Length = 625 

HSPS: 
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Score = 1357 (203.6 bits), Expect - l.le-138, P « l.le-138 
Identities = 263/582 (45%), Positives = 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct: 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLSIKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWI PLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QANVIC+VY+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICVVYSVTDESTVDGIQTKWLPLIRQSFGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A-NNTDKILPIMEANTEVETCVECSARTMKNVSEI FYYAQKAVIYPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDGVAD 243 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V DGVA+ 
Sbjct: 188 RARKALIRVFKICDRDNDGYLSDTELNDFQKLCFGIPLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTELNH 303 

L L GFL+LH LFI+RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL++LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKI DLQKKQTQRNVF 419 

+ W +TT +++ + EL YLG+ + +A ++ VTR++K DL+ T R VF 

Sbjct: 368 AYWNMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSIRVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDI 476 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL ++ 
Sbjct: 428 QCLVVGAKDAGKTVFMQSLAGRGMADVAQIGRRH-SPFVINRVRVKEESKYLLLREVDVL 486 

Query: 477 SESEFLTEAEI ICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQ 536 

S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 
Sbjct: 487 SPQDALGSGETSADVVAFLYDISNPDSFAFCATVYQKYFYRTKTPCVMIATKVEREEVDQ 54 6 

Query: 537 EYSISPTDFCRKHKMPPPQAFTCNTADAPSKDIFVKLTTMAMYP 580 

+ + P +FCR+ ++P P F+ S IF +L MA+YP 

Sbjct: 547 RWEVPPEEFCRQFELPKPIKFSTGNIGQSSSPIFEQLAMMAVYP 590 



Pedant information for DKFZphutel_22d2, frame 1 



Report for DKFZphutel_22d2 . 1 



[LENGTH] 

[MW] 

(plj 

( HOMOL) 

149 

[FUNCAT] 

[FUNCAT] 

3e-ll 

[FUNCAT] 

cerevisiae, 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

8e-09 

( FUNCAT ] 

8e-09 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT J 

( FUNCAT] 

[FUNCAT] 

9e-08 

[FUNCAT] 

YFLOOSw] 9e- 

[ FUNCAT] 

[ FUNCAT] 



580 

66541.61 
5.56 

TREMBL:CEUK08F11_3 gene: 



"K08F11.5"; Caenorhabditis elegans cosmid K08F11. le- 



99 unclassified proteins [S. cerevisiae, YAL048c] 5e-81 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR055wJ 

03.99 other cell growth, cell division and dna synthesis activities [S. 
YNL098c] 8e-09 

10.04.07 g-proteins [S. cerevisiae, YNL098c] 8e-09 

03.10 sporulation and germination [S. cerevisiae, YNL098c] 8e-09 

11.01 stress response [S. cerevisiae, YNL098c] 8e-09 

03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 8e-09 

01.03.13 regulation of nucleotide metabolism (S. cerevisiae, YNL098c] 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YNL098c] 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 4e-08 
11.10 cell death [S. cerevisiae, YORlOlw] 4e-08 

10.02.07 g-proteins [S. cerevisiae, YPRl65w] 7e-08 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-08 
30.08 organization of golgi [S. cerevisiae, YPR165w] 7e-08 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YFLOOSw] 



08 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 9e-08 

08.13 vacuolar transport IS. cerevisiae, YNL093w] le-07 
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(FUNCAT] 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YNL093w] 


le-07 




[ FUNCAT } 


08.19 cellular import [S. cerevisiae, YNL093w] le-07 


[ FUNCAT ] 


10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-07 


[ FUNCAT ] 


03.07 pheromone response, mating-type determination, sex-specific proteins 


ts. 


cerevisiae, YLR229c) 8e-07 


[ FUNCAT ] 


10.99 other signal-transduction activities [S. cerevisiae, YCR027c) 3e-06 


[ FUNCAT ] 


09.09 biogenesis of intracellular transport vesicles [S. cerevisiae, 


YGL2l0w] 9e 


-04 


[BLOCKS] 


BL00410A Dynamin family proteins 


(SCOP) 


dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-42 


[SCOP J 


dlguaa 3.25.1.3.10 RaplA [Human (Homo sapiens) 5e-59 


[PIRKWJ 


transmembrane protein le-79 


[PIRKWJ 


membrane trafficking 2e-06 


[PIRKW] 


acetylated amino end 3e-09 


[PIRKW] 


prenylated cysteine 3e-09 


[PIRKW] 


signal transduction le-07 


[PIRKW] 


transforming protein 3e-09 


[PIRKW] 


immediate-early protein 8e-06 


[PIRKW] 


alternative splicing 4e-08 


[PIRKW] 


P-loop le-10 


[PIRKW] 


lipoprotein 7e-10 


[ PIRKW] 


proto-oncogene 3e-09 


[PIRKW] 


methylated carboxyl end 3e-09 


[PIRKW] 


membrane protein 3e-09 


[ PI RKW] 


GTP binding le-10 


[PIRKW] 


thiolester bond 7e-10 


[SUPFAM] 


ras transforming protein le-10 


[PROSITE] 


ATP_GTP_A 2 


[PROSITE] 


MYRISTYL 3 


[PROSITE] 


EF HAND 1 


[PROSITE] 


CAMP PHOSPHO SITE 1 


[PROSITE] 


CK2 PHOSPHO SITE 14 


[ PROSITE] 


TYR PHOSPHO SITE 4 


[PROSITE] 


PKC PHOSPHO SITE 5 


[PROSITE] 


ASN_GLYCOSY LAT I ON 3 


[PFAM] 


Ras family (contains ATP/GTP binding P-loop) 


[KW] 


Irregular 


fKW] 


3D 



SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSE 

ljai- . . . EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

ljai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKE 

ljai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPAC I KALTRI FK I SDQDNDGTLNDAELN FFQRI C FNT PLAPQALEDVKN VVRKHI SDG 

ljai- 

SEQ VADSGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTE 

ljai- 

SEQ LNHHAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQ 

ljai- 

SEQ GFLSQWTLTTYLDVQRCLEYLGYLGYSILTEQESQASAVTVTRDKKIDLQKKQTQRNVFR 

ljai- 

SEQ CNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDISESE 

ljai- 

SEQ FLTEAEI ICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQEYSI 

ljai- 

SEQ SPTDFCRKHKMPPPQAFTCNTADAPSKDIFVKLTTMAMYP 

ljai- 



Prosite for DKFZphutel_22d2 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



118->122 
154->158 
346->350 
411->415 
94->97 
105->108 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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FbUUUUD 


148- 


■>151 


PKC PHOSPHO_ 


SITE 




PSUUUU3 


247- 


■>250 


PKC PHOSPHO_ 


_SITE 


d nfv ft ft n 


PSUUUUb 


414- 


■>417 


PKC PHOSPHO_ 


SITE 


d r\r\r n ft n ft ^ 


PS00006 


59->63 


CK2_PHOSPHO_ 


SITE 


o ntfi c* ft ft ft ft £ 


PS00006 


105- 


■>109 


CKt fnUbrnU 


SITE 




PS00006 


126- 


■>130 


CK.Z PHUbPnU 


S ITE 


rLHA»UUUU 0 


PbUUUUb 


139- 


•>143 


rnUornU_ 


SITE 




r»e? n n n £ 
PbUUUUb 


143- 


•>147 


Ut\i FHUbrHU 


SITE 


d nnr* ft n n n £ 


PbUUUUb 


196- 


>200 


LK^ rHUOrnU 


SITE 




PSUUUUb 


203- 


■>207 


/"•I/O nunc dua 




Drwvftftftftfi 


PbUUUUb 


311- 


■>315 


UKZ fnUbrnU 


SITE 


onnrfi ft ft ft 


P5UUUUD 


325- 


>329 


UKZ rnUornU_ 


SITE 




PSUUUUb 


370- 


■>374 


nunc n\ir\ 


SITE 






390- 


>394 


trti\Jotrn\J 


SITE 


PDOT00006 


PS00006 


477- 


>481 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


541- 


>545 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00007 


153- 


>161 


TYR PHOSPHO" 


"SITE 


PDOC00007 


PS00007 


376- 


>384 


TYR PHOSPHO" 


"SITE 


PDOC00007 


PS00007 


153- 


>162 


TYR PHOSPHO" 


SITE 


PDOC00007 


PS00007 


448- 


>457 


TYR~PHOSPHO~ 


"site 


PDOC00007 


PS00008 


240- 


>246 


MYRISTYL 




PDOC00008 


PS00008 


425- 


>431 


MYRISTYL 




PDOC00008 


PS00008 


433- 


>439 


MYRISTYL 




PDOC00008 


PS00017 


11 


->19 


ATP GTP A 




PDOC00017 


PS00017 


425- 


>433 


ATP GTP A 




PDOC00017 


PS00018 


197- 


>210 


EF HAND 




PDOC00018 



Pfam for DKFZphutel_22d2 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
++L+G+ VGK++L ++ EF+EE +P ++ T ++ +++ 
6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIr . NWweEIr 
ID E+ + + + +A+++ +VY+++N+ S ++++ +W++ 1+ 
53 THIVDYSEAEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLIN 102 

RHCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAI PFMETSAKT 
+ D+D+ P +LVGNK+DL + ++T + +E+SAK+ 

103 ERTDKDSRLPLILVGNKSDLVEYSSMETILPIMNQYTEI-ETCVECSAKN 151 

NiNVEEAFMEIvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 
N+ E F+ + +++L + +++ ++++ + + C+ 

152 LKN I SELFYYAQKAVLH PT GPLYCPEEKEMK-PACI-- 186 
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DKFZphutel__22el2 



group: signal transduction 

DKFZphutel_22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegan 
Drosophila and mammalian proteins. 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathw 
involving hte EGF-receptor. 

The new protein can find application in modulating the cornichon modulated signal transduc 
way and also the EGF receptor signaling processes. 



strong similarity to S.cerevisiae YGL054C and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 519 bp 

Poly A stretch at pos. 499, no polyadenylation signal found 



1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 
51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 
101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 
151 GATTACATTA ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 
201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 
251 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 
301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 
351 TTGAAGTCAG CCTACACTAC AGTGCACAGT TGAGGAGCCA GAGACTTCTT 
401 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 
451 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 
501 TGAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 



Peptide information for frame 1 

ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 

1 MEAVVFVFSL LDCCALIFLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTIVTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22el2, frame 1 

PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 185, P «= 5.7e-17 

TREMBL:SPAC2C4_5 gene: "SPAC2C4 . 05" ; product: "cornichon homolog"; 
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S.pombe chromosome I cosmid c2C4 . , N = 1, Score = 163, P ■» 3.7e-12 

PIR:S46084 probable membrane protein YBR210w - yeast (Saccharomyces 
cerevisiae), N - 1, Score = 162, P - 4.8e-12 

TREMBL:AF104398_1 product: "cornichon"; Homo sapiens cornichon mRNA, 
complete cds . , N - 1 , Score -141, P - 8e-10 

SWISSPROT:CNI_DROVI CORNICHON PROTEIN . , N - 1, Score = 139, P = 1.3e-09 

>PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) 

Length - 138 

HSPs: 

Score = 185 (27.8 bits), Expect = 5.7e-17, Sum P(2> - 5.7e-17 
Identities = 35/85 (41%), Positives = 56/85 (65%) 



Query: 1 MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 60 

M A +F+ +++ C +F V+F I +DLE DYIN CSK+NK + PE H +++ 
Sbjct: 1 MGAWLFILAVVVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFI FLLNLPVATWNI YRM 85 

L L++ +WF+FLLNLPV +N+ ++ 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score - 37 (5.6 bits), Expect = 5.7e-17, Sum P(2> = 5.7e-17 
Identities = 7/9 (77%), Positives - 9/9 (100%) 

Query: 82 IYRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 

i 

Pedant information for DKFZphutel_22el2, frame 1 



Report for DKF2phutel_22el2 . 1 

[LENGTH) 92 

(MW) 10614.98 

[pi] 5.04 

IHOMOL] PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces cerevisiae) 

5e-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGL054c] 

2e-15 

[PIRKW] transmembrane protein 2e-ll 

[PROSITE] CK2_PH0SPH0_SITE 3 

[KW] SIGNAL_PEPTIDE 33 

[KW] TRANSMEMBRANE 2 

SEQ MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 
PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 
MEM MMMMMMMMMM 



SEQ LLLMSLHWFI FLLNLPVATWNI YRMI LALIND 

PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 

MEM MMMMMMMMMMMMMMMMMMM . . MMMMMMM .... 



Prosite for DKFZphutel_22el2 . 1 

PS00006 9->l3 CK2_PH0SPHO_SITE PDOC00006 

PS00006 26->30 CK2_PHOSPHO_SITE PDOC00006 

PS00006 28->32 CK2 PH0SPH0_SITE PDOC00006 



(No Pfam data available for DKFZphutel_22el2 . 1) 
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DKFZphutel_22n2 



group: uterus derived 

DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map-"553.3 cR from top of Chrll linkage group" 
Insert length: 1556 bp 

Poly A stretch at pos. 1534, no polyadenylation signal found 



1 ACAACAGGCT GGTTGCTTGG CGTGGAATCC TAAAGTGGCC TGGCTTTGAG 

51 ACTGGAGTGA GACCCCAGCC CTAGGCTGGG GTTCTTTCCA TTATAGAGGA 

101 GACGGATTCA GAAGGGCTAC AGACCAAGGT TGTTGAAAAC CAGACATATG 

151 ATGAGCGTCT AGAGATTAAC GACTCCGAAG AGGTTGCAAG TATTTATACT 

201 CCAACCCCAA GACACCAAGG ACTTCCTCGT TCTGCCCATC TTCCTAACAA 

251 GGCTATGGCT GATAACAGCA GTGATGAGTG TGAAGAGGAA AATAACAAGG 

301 AGAAGAAGAA GACCTCACAG TTGACACCTC AACGGGGCTT TAGTGAAAAT 

351 GAGGATGACG ATGATGATGA TGATGATTCA TCTGAAACTG ATTCTGATTC 

401 TGATGATGAT GATGAAGAGC ATGGAGCCCC TCTGGAAGGG GCCTATGACC 

451 CTGCAGACTA TGAGCATTTG CCAGTTTCTG CTGAAATTAA GGAACTCTTC 

501 CAGTACATCA GTAGGTACAC ACCTCAGTTG ATTGACCTGG ACCACAAACT 

551 GAAGCCTTTC ATTCCTGATT TTATCCCAGC TGTCGGGGAT ATTGATGCAT 

601 TCTTAAAGGT CCCACGTCCT GATGGAAAGC CTGACAACCT TGGCCTATTG 

651 GTATTGGATG AACCTTCTAC AAAGCAGTCA GACCCTACGG TGCTCTCACT 

701 CTGGTTAACA GAGAATTCTA AGCAGCACAA CATCACACAA CATATGAAAG 

751 TAAAAAGCCT AGAAGATGCA GAAAAGAATC CCAAAGCCAT TGACACGTGG 

801 ATTGAGAGCA TCTCTGAATT ACACCGTTCT AAGCCCCCTG CGACTGTGCA 

851 CTACACCAGG CCCATGCCCG ACATTGACAC GCTGATGCAG GAATGGTCCC 

901 CGGAGTTTGA AGAGCTTTTG GGCAAGGTAA GCCTGCCCAC GGCAGAGATT 

951 GATTGCAGCC TGGCAGAGTA CATTGACATG ATCTGTGCCA TTCTAGACAT 

1001 CCCTGTCTAC AAGAGTCGGA TCCAGTCCCT CCATCTGCTC TTTTCCCTCT 

1051 ACTCAGAATT CAAGAACTCA CAGCATTTTA AAGCTCTCGC TGAAGGCAAG 

1101 AAAGCATTCA CTCCTTCATC CAATTCCACC TCCCAAGCTG GAGACATGGA 

1151 GACATTAACC TTCAGCTGAG ACACTTCCCA AGCTGCTGTT TCAAGGCTGA 

1201 GCTGGCCCCT CTGCCCCAGC TGAGATGGAC AGATCGTTGT CAGCTACTTG 

1251 ATGTCCTTGC CCATGCCACA GCTTGGCTCA GGGGCAGTGC ATGTCCTGCT 

1301 GCCCTCTCTG CCAGAGGGCA CAGAACATGT TTGTTTAATG AACCTGCCTG 

1351 CCTCAGATTG CTGTCCCCGG GGAGTTAATG CATCTACACC ACTGTGGGGA 

1401 TTTGAGTTAT AAGAATTGGA ATTTCTGAGA TCCCATGGAG GTTAGATTGG 

1451 GAGGAAAGCT TAAAAGATGT CCTTTTTGTG AGAGGGATGG AATTGTTTTC 

1501 TTTCATTCGT AAAGTTAGTG AGTAAAGATT TTATAAATCA AAAAAAAAAA 

1551 AAAAAA 



BLAST Results 



Entry HS188252 from database EMBL: 
human STS WI-12265. 
Score => 2554, P = 4.1e-109, identities = 556/587 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE EENNKEKKKT SQLTPQRGFS ENEDDDDDDD DSSETDSDSD 
51 DDDEEHGAPL EGAYDPADYE HLPVSAEIKE LFQYISRYTP QLIDLDHKLK 
101 PFIPDFIPAV GDIDAFLKVP RPDGKPDNLG LLVLDEPSTK QSDPTVLSLW 
151 LTENSKQHNI TQHMKVKSLE DAEKNPKAID TWIESISELH RSKPPATVHY 
201 TRPMPDIDTL MQEWSPEFEE LLGKVSLPTA EIDCSLAEYI DMICAILDIP 
251 VYKSRIQSLH LLFSLYSEFK NSQHFKALAE GKKAFTPSSN STSQAGDMET 
301 LTFS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22n2, frame 3 

PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae), N *» 1, 
Score = 132, P = le-05 



>PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae) 
Length = 562 

HSPs : 

Score = 132 (19.8 bits), Expect = 1.0e-05, P - 1.0e-05 
Identities = 24/63 (38%), Positives - 35/63 (55%) 

Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 497 EEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 
D 

Sbjct: 557 IID 559 

Score = 122 (18.3 bits), Expect - 1.4e-04, P - 1.4e-04 
Identities = 20/52 (38%), Positives = 33/52 (63%) 

Query: 4 NSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

N+ +E ++E+ +E + T + + N+DDDDDDDD + D D DDDD++ 

Sbjct: 494 NNEEEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDD 545 



Pedant information for DKFZphutel_22n2, frame 3 



Report for DKFZphutel_22n2 . 3 



[LENGTH] 


304 




[MW] 


34285.85 




[pi] 


4 .37 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO SITE 


10 


[PROSITE] 


PKC PHOSPHO SITE 


1 


[PROSITE] 


ASN GLYCOSYLATION 


3 


[KW] 


All Alpha 




[KW] 


LOW COMPLEXITY 11. 


.84 % 



SEQ MADNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

SEQ EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFIPAVGDIDAFLKVP 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

SEQ RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

SEG 

PRD ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 

SEQ TWIESISELHRSKPPATVHYTRPMPDIDTLMQEWSPEFEELLGKVSLPTAEIDCSLAEYI 

SEG 

PRD hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 

SEQ -DMICAILDIPVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 

SEG 
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PRD hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 

SEQ LTFS 

SEG 

PRD CCCC 

Prosite for DKFZphutel_22n2 . 3 



PS00001 




4->8 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


159- 


>163 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


290- 


>294 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


17 


->21 


CAMP PHOSPHO SITE 


PDOC00004 


PSOO0O4 


18 


->22 


CAMP PHOSPHO~SITE 


PDOC00004 


PS00005 


138- 


>141 


PKC PHOSPHO SITE 


PDOC00005 


PS0O006 




5->9 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


30 


->34 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


43 


->47 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


45 


->49 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


47 


->51 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


49 


->53 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


168- 


>172 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


181- 


>185 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


185- 


>189 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


235- 


>239 


CK2 PHOSPHO SITE 


PDOC00006 


PS00009 


280- 


>284 


AMI DAT ION 


PDOC00009 



(No Pfara data available for DKFZphutel_22n2 . 3) 
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DKFZphutel_22o2 



group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S.pombe SPBC3E7.03c 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus : map="llpl5 . 5" 
Insert length: 2714 bp 

Poly A stretch at pos. 2695, polyadenylation signal at pos . 2677 



1 GCAGGGCACG GTGGGGGCTG AGATCGTTTC CTGTTGGAAC TTCTGGCCCA 
51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 
101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 
151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGTTA AAGCGCCACC 
201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 
251 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 
' 301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 
351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 
401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 
4 51 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 
501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 
551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 
601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 
651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 
701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 
751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 
801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 
851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 
901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 
951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 
1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC ATCAAGGGGG AGGTGGACGA 
1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 
1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 
1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 
1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 
1251 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 
1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 
1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 
1401 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 
1451 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 
1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 
1551 GTGTGCCCCG ATTCATCAAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 
1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 
1651 AGAGGATGAG GACACAGACA CAGATGAGTA CAAGGAAGCC AAAGCCAGCA 
1701 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 
1751 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 
1801 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 
1851 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 
1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 
1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 
2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 
2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 
2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 
2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 
2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 
2251 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 
2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 
2351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 
2401 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 
24 51 CCTGTCTTCA CTGAGCGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 
2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 
2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 
2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 
2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 



Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score - 3356, P = 2.0e-144, identities =» 672/673 

Entry HS263253 from database EMBL: 
human STS SHGC-15914. 
Score - 1143, P » 9.0e-46, identities » 245/255 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 326 bp to 1936 bp; peptide length: 537 
Category: similarity to unknown protein 



1 MEPRAVAEAV ETGEEDVIME ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 
51 VSVLEQGLPP SHRVIWLQSV RILSRDRNCL DPFTSRQSLQ ALACYADISV 
101 SEGSVPESAD MDVVLESLKC LCNLVLSSPV AQMLAAEARL VVKLTERVGL 
151 YRERSFPHDV QFFDLRLLFL LTALRTDVRQ QLFQELKGVR LLTDTLELTL 
201 GVTPEGNPPP TLLPSQETER AMEILKVLFN ITLDSIKGEV DEEDAALYRH 
251 LGTLLRHCVM IATAGDRTEE FHGHAVNLLG NLPLKCLDVL LTLEPHGDST 
301 EFMGVNMDVI RALLIFLEKR LHKTHRLKES VAPVLSVLTE CARMHRPARK 
351 FLKAQGWPPP QVLPPLRDVR TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 
401 FLFVLCSESV PRFIKYTGYG NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 
451 EYKEAKASIN PVTGRVEEKP PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 
501 VIQPMGMSPR GHLTSLQDAM CETMEQQLSS DPDSDPD 



BLAST P hits 



No BLAST P hits available 



Alert BLASTP hits for DKFZphutel_22o2, frame 2 

TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7., N = 1, Score » 112, P = 0.0023 



>TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7. 
Length - 362 



HSPs: 



Score = 112 (16.8 bits), Expect - 2.3e-03, P - 2.3e-03 
Identities = 71/289 (24%), Positives = 124/289 (42%) 



Query: 215 SQETERAM-EILKVLFNITLDSIKGEVDEEDAALYRHLGTLLRHCVMIATAGDRTEEFHG 273 

SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

Sbjct: 12 SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

Query: 274 HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVIRALLI FLEKRLHKTH RL 327 

HA N LL NL L LD + + T + +1 + +LEK L+ + 
Sbjct: 66 HATNALLSFNLQLLSLDQAIYVSEIACQT LQSILISREVEYLEKGLNLCFDIAAKY 121 

Query: 328 KESVAPVLSVLTECARMHRPARKFLKAQGWPPPQVLPPLRDVRTRP-EVGEMLRNKLVRL 386 

+ ++ P+L++L + +LPDR++G+R L+RL 

Sbjct: 122 QNTLPPILAILLSLLSFFNIKQNL SMLLFPTNDDRKQSLQKGKSFRCLLLRL 173 

Query: 387 MT-HLDTDVKRVAAEFLFVLCSESVPRFIKYTGYGNAAGLLAARGLMAGGRPEGQYS 4 42 

+T++ ALLC + + GGAG+ M P+ + 

Sbjct: 174 LTIPIVEPIGTYYASLLNELCDGDSQQIARIFGAGYAMGISQHSETMPFPSPLSKAASPV 233 

Query: 443 -EDEDTDTDEYKEAKASINPVTGRV--EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 499 

+ + +E +I+P+TG + +E +++E+KE EA +L +F +L +N 

Sbjct: 234 FQKNSRGQENTEENNLAIDPITGSMCTNRNKSQRLE-LSQEEKEREAERLFYLFQRLEKN 292 
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Query: 



Sbjct: 



500 RVIQ 503 
IQ 

293 STIQ 296 



Pedant information for DKFZphutel_22o2, frame 2 



Report for DKFZphutel_22o2 .2 



[KW] 



(LENGTH J 
(MW] 



(pi] 



(BLOCKS) 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITEJ 
(KW) 



537 

60372.53 
5.20 

BL00415L Synapsins proteins 
MYRISTYL 4 
CK2_PHOSPHO_SITE 13 
PKC_PHOSPHO_SITE 10 
ASN_GL YCOS Y L AT I ON 1 
All_Alpha 

LOW COMPLEXITY 9,50 % 



SEQ MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 

SEG 

PRD ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhhhhhhhhhhhhccccc 

SEQ SHRVIWLQSVRILSRDRNCLDPFTSRQSLQALACYADISVSEGSVPESADMDVVLESLKC 

SEG 

PRD cceeeeeccccccccccccccccchhhhhhhhhhhhceeeeccccccccchhrihhhhhhh 

SEQ LCNLVLSSPVAQMLAAEARLVVKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 

SEG . xxxxxxxxxxxxxxx . . . 

PRD hhhhccccchhhhhhhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 

SEQ QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 

SEG 

PRD hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccchhh 

SEQ DEEDAALYRHLGTLLRHCVMIATAGDRTEEFHGHAVNLLGNLPLKCLDVLLTLEPHGDST 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 

SEQ EFMGVNMDVIRALLI FLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 

SEG 

PRD eeeehhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 

SEQ QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFIKYTGYG 

SEG xxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccccceeeecccc 

SEQ NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

SEG xxxxxxxxxxxxxxx xxxxxxxxx 

PRD chhhhhhhhhccccccccccccccccccchhhhhhhhhccccccceeecccccccchhhh 

SEQ QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 



Prosite for DKFZphutel_22o2 .2 



PS00001 
PS00005 
PS0OOO5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PSOOO06 
PS00006 
PS00006 
PS00006 
PS0O006 



117->120 
145->148 
218->221 
235->238 
324->327 
463->466 
508->511 



99->103 
104->108 
263->267 
371->375 



230->234 



12->16 
34->38 
52->56 



6l->64 
69->72 
84->87 



ASN 
PKC 
PKC" 
PKC' 
PKC" 
PKC" 
PKC" 
PKC" 
PKC' 
PKC" 
PKC" 
CK2 
CK2 
CK2" 
CK2" 
CK2" 
CK2" 
CK2 



GLYCOSYLATION 
"PHOSPHO_SITE 
'PH0SPH0_SITE 
"PHOSPHO_SITE 
'PH0SPH0_SITE 
"PH0SPH0_SITE 
"PHOSPHO_SITE 
"PH0SPH0_SITE 
>HOSPHO_SITE 
>HOSPHO_SITE 
"PHOSPHO_SITE 
PHOSPHO_SITE 
PHOSPHO_SITE 
>HOSPHO_SITE 
PHOSPHO_SITE 
'PH0SPH0_SITE 
'pHOSPHO_SITE 
PHOSPHO SITE 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC000O6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00006 


388- 


>392 


CK2 PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


442- 


>446 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


447- 


>451 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


491- 


>495 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


515- 


>519 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


530- 


>534 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


57 


->63 


MYRISTYL 




PDOC00008 


PS00008 


420- 


>426 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


430- 


>436 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_.22o2.2) 
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DKFZphutel_23el3 



group: metabolism 

DKFZphtes3_15j 18 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins . 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 



heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map="578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos. 1831, polyadenylation signal at pos. 1810 



1 GGTTTATTAA GCTCCTGGCT 

51 AGCCTGGGCA GCCTGGGAAG 

101 GTGAGGCAGT GCGGACGGGG 

151 GGGGTTACCT TTGGGGGCTG 

201 TGGCAGTGGT TGGTTCTGCT 

251 GCTGAAGAAT AAGCTAGCCC 

301 CAGGTGGTTC TGTCTCTCTG 

351 ACCATGGCTG ACGGTCAGAT 

401 GCGCCGAGAC CCCTTCCGGG 

451 ATGGCTTTGG CATGGACCCC 

501 GACTGGGCTC TGCCTCGTCT 

551 GGGCATGGTG CCCCGGGGCC 

601 CCGAGGGCAG GACCCCCCCA 

651 GTGAATGTGC ACAGCTTCAA 

701 TGGATACGTG GAGGTGTCTG 

751 GCATTGTTTC TAAGAACTTC 

801 GATCCTGTGA CAGTATTTGC 

851 CGAAGCTCCC CAGGTCCCTC 

901 ACAACGAGCT TCCCCAGGAC 

951 AGTACTGGCC CATCCTTGTT 

1001 CAGGATACAT TACTTTAGCT 

1051 GAGGGTGCGG GGGTGAGGAC 

1101 TAGATTTCTC CACAGGATAG 

1151 AGGCCAAAAT ACTAGTTTTG 

1201 TGTTGCACAT TCTATAGTTG 

1251 ACGTTGTATC TTACTTGCAG 

1301 CTCCCCCATC ACCCAGGTTC 

1351 CAAACCATGC CGCATGGTTT 

1401 GTGCTTCCAC ATGCCTGGCC 

1451 CCATATGGAA TTTATCCATC 

1501 CCTCTGCCCA GATGTGTCCA 

1551 CCCTAAGGAC GCTGGGAGCC 

1601 CTTTCTTCTG TCCCCTGTGT 

1651 CTCCAGACAG CTCCATCAGG 

1701 TAGGCTAGTG GTATTGTGTA 

1751 TGAGTTATGC TGTTGTTTAG 

1801 TAATAATAAT AATAAAGGAG 

1851 AAAA 



CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 
CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 
ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 
GGACCCCAGT CGAGGGGACA CAACCGTCCC 
TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 
AGCCACACCA CCTTGTTGTG TGACCTTGGG 
AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 
GCCCTTCTCC TGCCACTACC CAAGCCGCCT 
ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 
TTCCCAGACG ACTTGACAGC CTCTTGGCCC 
CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 
CCACTGCCAC CGCCAGGTTT GGGGTGCCTG 
CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 
GCCAGAGGAG TTGATGGTGA AGACCAAAGA 
GCAAACATGA AGAGAAACAG CAAGAAGGTG 
ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 
CTCACTTTCC CCAGAGGGTC TGCTGATCAT 
CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 
AGCCAGGAAG TCACCTGTAC CTGAGATGCC 
TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 
GAACTCAGAT TTAGTGCAAG TAAAATGTTA 
TGACCACAGA TTCCCTGGAT AGTGTAGTGG 
CGCAATTGGC AAATCATGCT TGGTTGTGTT 
CTTTCTTTAC CTTTTCTATC TTGATGAAAA 
CAAAACACAT AAAAGGGGAC TTAACATTTC 
TGAATGCAAG GGTTACTTTT CTCTGGGGAC 
CTACTCTGGG CTCCCGATTC CCATGGCTCC 
GGTTAATGAA ACCCAGTAGC TAACCCCACT 
TAAAATGGGT GATATACAGG TCTTATATCC 
AACCACATAA AAACAAACAG TGCCTTCTGC 
GCACGTTCTC AAAGTTTCCA CATTAGCACT 
TGTCAGTTTA TGATCTGACC TAGGTCCCCC 
TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 
AACCAAGCAA AGGCCAGATA GCCTGACAGA 
TATGGGCGGG ACGTGTGTGT CATTATTATT 
GGGTAAATAA CAGTAAATAA TTAATAATAA 
CTGACGTTCT TAAAAAAGAA AAAAAAAAAA 



BLAST Results 



Entry HS286348 from database EMBL: 
human STS TIGR-A002J47 . 
Score - 510, P - 1.2e-16, identities « 102/102 
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Medline entries 



95391379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein. 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 



Peptide information for frame 3 



ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE_ASP (28-39) 



1 MADGQMPFSC HYPSRLRRDP FRDSPLSSRL LDDGFGMDPF PDDLTASWPD 

51 WALPRLSSAW PGTLRSGMVP RGPTATARFG VPAEGRTPPP FPGEPWKVCV 

101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGIVSKNFT KKIQLPAEVD 

151 PVTVFASLSP EGLLIIEAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_23el3, frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N - 1, Score =* 304, P = 
4.3e-27 

PIR:JN0924 heat shock 27 protein - rat, N « 1, Score = 301, P = 8.9e-27 

TREMBL:MM03561_1 product: "heat shock protein HSP27" ; Mus musculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N - 1, Score = 301, P - 8.9e-27 



>PIR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs : 



Score = 304 (45.6 bits), Expect = 4.3e-27, P = 4 . 3e-27 
Identities = 80/182 (43%), Positives = 102/182 (56%) 



Query: 


1 


MADGQMPFSC-HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSS 


58 




M + ++PFS PS DPFRD P SRL D FG+ P++ WW S 




Sbjct: 


1 


MTERRVPFSLLRSPSW DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 


50 


Query: 


59 


AW PGTLRSGMVP RGPTATARFGVPAEGR — TPPPFPG EPWKVCVNVHSF 


105 




WPG +R +P GP A A PA R + G + W+V ++V+ F 




Sbjct: 


51 


GWPGYVRP--IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 


108 


Query: 


106 


KPEELMVKTKDGYVEVSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI 


165 




PEEL VKTKDG VE++GKHEE+Q E G +S+ T K LP VDP V +SLSPEG L 




Sbjct: 


109 


APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 


168 


Query: 


166 


IEAPQVPPYSTFGE 179 








+EAP P + E 




Sbjct: 


169 


VEAPMPKPATQSAE 182 





Pedant information for DKFZphutel_23el3, frame 3 



Report for DKFZphutel_23el3 , 3 



[LENGTH] 196 

[MW) 21604.37 
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tplj 


5.00 


( HOMOL ] 


PIR:JC4244 heat-shock 27K protein - dog 3e-22 


[ BLOCKS ] 


BL01031C 


[PIRKW] 


blocked amino end le-13 


(PIRKW) 


acetylated amino end 4e-13 


[ PIRKW] 


phosphoprotein 7e-21 


(PIRKW) 


glycoprotein 2e-ll 


I PIRKW J 


heat shock 7e-21 


I PIRKW] 


molecular chaperone 4e-l3 


[PIRKW] 


alternative splicing le-19 


[PIRKW] 


eye lens 6e-14 


[PIRKW] 


stress-induced protein 7e-21 


[SUPFAM] 


alpha-crystallin 7e-21 


[PROSITE] 


SUBTILASE ASP 1 


[PROSITE] 


MYRISTYL 2 


[PROSITE] 


CK2 PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO SITE 6 


[PROSITE] 


ASN GLYCOSYLATION 1 


[PFAM] 


Heat shock hsp20 proteins 


[KW] 


All Beta 


[KW] 


LOW COMPLEXITY 7.14 % 



SEQ MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 

SEG xxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 

SEQ PGTLRSGMVPRGPTATARFGVPAEGRTPPPFPGEPWKVCVNVHSFKPEELMVKTKDGYVE' 

SEG 

PRD cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeecccceee 

SEQ VSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLIIEAPQVPPYSTFGES 

SEG 

PRD eccchhhhhcccceeeetccccccccccccccceeeecccccceeeeeccccccccccccc 

SEQ SFNNELPQDSQEVTCT 

SEG 

PRD cccccccccceeeccc 



Prosite for DKFZphutel_23el3 . 3 



PSO0O0I 


138- 


>142 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


27 


->30 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


63 


->66 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


76 


->79 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


104- 


>107 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


122- 


>125 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


140- 


>143 


PKC PHOSPHO SITE 


PDOC00005 


PSO0006 


47 


->51 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


176- 


>180 


CK2~PHOSPHO SITE 


PDOC00006 


PS00008 


62 


->68 


MYRISTYL 


PDOC00008 


PS00008 


132- 


>138 


MYRISTYL 


PDOC00008 


PS00136 


28 


->39 


SUBTILASE ASP 


PDOC00125 



Pfam for DKFZphutel_23el3 . 3 



HMM_NAME 
HMM 
Query 
HMM 

Query 

HMM 

Query 



Heat shock hsp20 proteins 

♦AMMrpPWDWRE DpDHFeVrMDMPGFKPEEIKVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

77 ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 123 

EHEREEEREDDkWWWHERI YRHFMRRFrLPENVDpDqlkAsMSdNGVLTI 
+HE E++ + + ++ F +++LP +VDP + AS+S++G+L I 

124 KHE EKQQ EGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI I 166 



TVPKpEP* 
++p ++p 
167 EAPQVPP 



173 
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DKFZphutel_23gll 



group: uterus derived 

DKFZphutel 23gll encodes a novel 256 amino acid protein with similarity to S.pombe 
SPAC31G5.12c and S. cerevisiae Maflp. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to SPAC31G5.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1674 bp 

Poly A stretch at pos . 1664, polyadenylation signal at pos. 1644 



1 GGGGGAGGCG GAGGTCGCTC GCTCGCTCGC TCGGCTCGCT GACTCGCCGG 
51 AGCGCTCTGT GGCGGTCGGC GGCAGGTCGG TCGCGAGAGC GGGCTCTGTG 
101 GAAGGGGGCG AGGCTATGTC GCGGTGGCAG CCCGGATGGG CCGGCAGGGC 
151 CGGGAGTAAC GGGACGTCGC CGCGGAGCTT CTTCCCCCGG ATACAGTGCG 
201 GCCCGAGCGG AGGCCGCGGC GCCGCCCTCC GATCTTGAAG AGCCCGCGCT 
251 GCGCGGAGCC CGCCCCCGCC TGCGCACCGG CACCGACGCG GAGCGACCAG 
301 CCCAGCCAGA CCCGGCCCGG CGCGGCCTGA TCTAACCCAG CCAGGCAGGC 
351 AATACTAGCC CCTCTGGAGC ACGGAGCTCC TTCCCCAAAG ACATGAAGCT 
401 ATTGGAGAAC TCGAGCTTTG AAGCCATCAA CTCACAGCTG ACTGTGGAGA 
4 51 CCGGAGATGC CCACATCATT GGCAGGATTG AGAGCTACTC ATGTAAGATG 
501 GCAGGAGACG ACAAACACAT GTTCAAGCAG TTCTGCCAGG AGGGCCAGCC 
551 CCACGTGCTG GAGGCACTTT CTCCACCCCA GACTTCAGGA CTGAGCCCCA 
601 GCAGACTCAG CAAAAGCCAA GGCGGTGAGG AGGAGGGCCC CCTCAGTGAC 
651 AAGTGCAGCC GCAAGACCCT CTTCTACCTG ATTGCCACGC TCAATGAGTC 
701 CTTCAGGCCT GACTATGACT TCAGCACAGC CCGCAGCCAT GAGTTCAGCC 
751 GGGAGCCCAG CCTTAGCTGG GTGGTGAATG CAGTCAACTG CAGTCTGTTC 
801 TCAGCTGTGC GGGAGGACTT CAAGGATCTG AAACCACAGC TGTGGAACGC 
851 GGTGGACGAG GAGATCTGCC TGGCTGAATG TGACATCTAC AGCTATAACC 
901 CAGACTTGGA CTCAGATCCC TTCGGGGAGG ATGGTAGCCT CTGGTCCTTC 
951 AACTACTTCT TCTACAACAA GCGGCTCAAG CGAATCGTCT TCTTTAGCTG 
1001 CCGTTCCATC AGTGGCTCCA CCTACACACC CTCAGAGGCA GGCAACGAGC 
1051 TGGACATGGA GCTGGGGGAG GAGGAGGTGG AGGAAGAAAG CAGAAGCAGG 
1101 GGCAGTGGGG CCGAGGAGAC CAGCACCATG GAGGAGGACA GGGTCCCAGT 
1151 GATCTGTATT TGATGAGGAG GAGCCGAGGC CCCAGCTTCA TCCAGCTTCA 
1201 ACCAATGCCT GGACCTGTCC ACCTGAGAGG CCCCTGGGGC CTCCCCAGCT 
1251 GCTGGCCAGA CCCTGGCGCT GCCACAGTCC TGGCACTGCC CAAGGCCATA 
1301 CCTGCCTAGC CCTTTGGCTC CATCCTGTGG ATGCCCACTC ACCCCTCAGA 
1351 CTCCTGCTGC CCATGCTGTG GCCGGACTTG TCAGCAGGGG GCCTGGTGGG 
1401 AGGAGCGACT GCCCTGCCCA AATGAACTGC CACAGCAGGG ACAGCTGGAC 
1451 CGCAGAGTTT ATTTTTGTAT TTCTACTGGG CCTGCACACT CCAGCCCAAA 
1501 GGGTCTGTGG CCGGAGGCCC CACGAGCAGG CCCCAGCAGT CACCGGCTCT 
1551 GGTCTTGGGC CGGCCCCGGT GCCCACCTGT ACCCCCACCT CGCCCATTTG 
1601 GCCGCGTGCA CTGAGTGTCA CTTTGCTGCA GCTCGTTTCT TTCCAATAAA 
1651 AGTTTCTGTG ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 
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1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD KHMFKQFCQE 

51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR KTLFYLIATL 

101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR EDFKDLKPQL 

151 WNAVDEEICL AECDI YSYNP DLDSDPFGED GSLWSFNYFF YNKRLKRIVF 

201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA EETSTMEEDR 

251 VPVICI 



BLAST P hits 



Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5 . 12c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c31G5. 

Score - 272, P - 9.3e-24, identities - 51/127, positives = 80/127 
Entry SPD656_1 from database TREMBL : 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds . 

Score - 263, P = 8.4e-23, identities - 50/127, positives = 79/127 
Entry S50986 from database PIR: 

MAF1 protein - yeast (Saccharomyces cerevisiae) >SWISSPR0T:MAF1_YEAST 
MAF1 PROTEIN. >TREMBL : SC19492_1 gene: "MAF1"; product: "Maflp"; 
Saccharorayces cerevisiae Maflp (MAF1) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp"; product: "Maflp"; S. cerevisiae 
chromosome IV cosmid 8119. 

Score - 180, P = 2.3e-17, identities - 43/133, positives - 75/133 

Entry AF098499_2 from database TREMBL: 

gene: "C43H8.2"; Caenorhabditis elegans cosmid C43H8. 

Score - 263, P = 9.2e-23, identities - 78/252, positives - 118/252 



Alert BLASTP hits for DKFZphutel_23gll, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_23gll, frame 3 



Report for DKFZphutel_23gll . 3 



[LENGTH] 256 

[MW] 28869.95 

[pi] 4.51 

[HOMOL] TREMBL:SPAC31G5_12 gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; 

S.pombe chromosome I cosmid c31G5. 4e-23 



[FUNCAT] 

6e-13 

[PROSITE] 

(PROSITEJ 

[ PROSITE] 

[PROSITE] 

[KW) 

[KW] 



06.04 protein targeting, sorting and translocation [S. cerevisiae, YDROOSc] 

MYRISTYL 3 

CK2_PHOSPHO_SITE 5 

PKC_PHOSPHO_SITE . 6 

ASN_GLYCOS YLAT I ON 3 
All_Alpha 

LOW COMPLEXITY 7.81 % 



SEQ MKLLENSSFEAINSQLTVETGDAHIIGRIESYSCKMAGDDKHMFKQFCQEGQPHVLEALS 

SEG 

PRD cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 

SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWVVNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDIYSYNPDLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 

SEQ GSLWSFNYFFYNKRLKRI VFFSCRSISGSTYTPSEAGNELDMELGEEEVEEESRSRGSGA 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 

SEQ EETSTMEEDRVPVICI 

SEG XX 

PRD cccccccccceeeccc 



529 



WO 01/12659 



PCT/IB00/01496 



prosite for DKFZphutel_23gll .3 



PS00001 
PS00001 
PSO0O01 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 



6- >10 
101->105 
132->136 

33->36 
85->88 
89->92 
103->106 
112->115 
202->205 

7- >ll 
99->103 

212->216 

238- >242 
244->248 

66->72 
181->187 

239- >245 



AS N_G L YC OS YL AT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOS PHO^S I TE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_23gll . 3) 
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DKFZphutel_24cl9 

group: transmembrane protein 

DKFZphutel 24cl9 encodes a novel 195 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 

unknown 

membrane regions: 1 

Summary DKFZphutel_24cl9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 



complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

insert length: 769 bp 

Poly A stretch at pos. 746, polyadenylation signal at pos. 735 



1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 
51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 
101 TGGGAGGAGA GCATAAGGCT CAAAATGGAA AATCATAAAT CCAATAATAA 
151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 
201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 
251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 
301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 
351 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 
4 01 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 
4 51 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 
501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 
551 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 
601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 
651 AACAATATAA ACTACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 
701 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 
751 ACAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 



1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLI A 

51 NSLFRRILNV TKARIAAGLP MAGIPFLTTD LTYRC FVSFP LNTGDLDCET 

101 CTITRSGLTG LVIGGLYPVF LAIPVNGGLA ARYQSALLPH KGNILSYWIR 

151 TSKPVFRKML FPILLQTMFS AYLGSEQYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphutel_24cl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_24cl9, frame 2 



Report for DKFZphutel_24cl9.2 

[LENGTH] 195 

[MW] 21527.45 

[pll 9.36 

[PROSITE] MYRISTYL 6 

[PROSITE] CK2_PHOSPHO_SITE 1 

f PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] TRANSMEMBRANE 1 

SEQ MENHKSNNKENITIVDISRKINQLPEAERNLLENGSVYVGLNAALCGLIANSLFRRILNV 
PRD cccccccccceeeeeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 
MEM 

SEQ TKARIAAGLPMAGIPFLTTDLTYRCFVSFPLNTGDLDCETCTITRSGLTGLVIGGLYPVF 
PRD hhhhhhhccccccceeeeecccccccrcccccccccccccccccccccceeeecccceee 
MEM MMMMMMMMMMMMMM 

SEQ LAIPVNGGLAARYQSALLPHKGNILSYWIRTSKPVFRKMLFPILLQTMFSAYLGSEQYKL 
PRD eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 
MEM MMM 

SEQ LIKALQLSEPGKEIH 
PRD hhhhhhhcccccccc 
MEM 



Prosite for DKFZphutelj_24cl9 . 2 



PS00001 


U->15 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


34->38 


ASN GLYCOSYLATION 


PDOC000O1 


PS00001 


59->63 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


18->21 


PKC PHOSPHO SITE 


PDOC00Q05 


PS00005 


82->85 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


151->154 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


40->46 


MYRISTYL 


PDOC00008 


PS00008 


47->53 


MYRISTYL 


PDOC00008 


PS00008 


68->74 


MYRISTYL 


PDOC00008 


PSO0008 


110->116 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


MYRISTYL 


PDOC00008 


PS00008 


142->148 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphutel_24cl9.2) 
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DKFZphutel_24ell 



group: intracellular transport and trafficking 

DKFZphutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golgi 
4-transmembrane spanning transporter MTP. MTP may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane- 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport . 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane-bound 
compartments . 

similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane -bound compartment? 
Sequenced by Qiagen 
Locus: /map="8" 
Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylation signal at pos . 1963 

1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 
51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 

101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 

151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 

201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 

251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 

301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 

351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 

401 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 

4 51 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 

501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 

551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 

601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 

651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 

701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 

751 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 

801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 

851 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 

901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 

951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 
1001 TAGATTGAAA ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 
1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 
1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 
1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 
1201 TCCCTCTCTT TTGAAAATGT AAAATAAAAC CAAAAATAGA CAACTTTTTC 
1251 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 
1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG 
1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 
1401 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 
14 51 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 
1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 
1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 
1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 
1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 
1701 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 
17 51 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 
1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 
1851 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 
1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 
1951 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 
2001 AAAAA 



BLAST Results 



Entry HS012351 from database EMBL: 
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human STS SHGC-31823. 
Score = 1629, P = 3.1e-67, identities - 343/354 



Medline entries 



96199248: 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 



Peptide information for frame 1 



ORF from 184 bp to 861 bp; peptide length: 226 
Category: strong similarity to known protein 



1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 

51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 

101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 

151 PTCLVLIILL FISIILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 

201 TVLLPPYDDA TVNGAAKEPP PPYVSA 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24ell , frame 1 

SWISS PROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., N - 1, Score - 551, P - 2.9e-53 

SWISSPROT:MTRP_MOUSE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP. , N 
- 1, Score = 539, P - 5.3e-52 

TREMBL:HS304981_1 product: "E3 protein"; Human retinoic acid-inducible 
E3 protein mRNA, complete cds . , N - 1, Score =127, P » 3.4e-06 

>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108) . 

Length - 233 

HSPs: 

Score - 551 (82.7 bits), Expect = 2.9e-53, P = 2.9e-53 
Identities - 102/221 (46%), Positives = 148/221 (66%) 

RFYSNSCCLCCHVRTGTILLGVWYLIINAWLLILLSALADPDQY NFSSSELGGDF 

RFYS CC CCHVRTGT I +LG WY+++N ++ ++L + P+ N +G + 



E M D N C+ A+S+LM +1 +M YGA + W+IPFFCY++FDF L+ LVAI+ L 
SERMAD-NACVLFAVSVLMFIISSMLVYGAISYQVGWLIPFFCYRLFDFVLSCLVAISSL 

IYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCYRYI 
Y I+EY+ QLP +FPY+DD+++++ +CL+ I+L+F ++ + FK YLI+CVWNCY+YI 
TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFIVLVFFALFIIFKAYLINCVWNCYKYI 

NGRNSSDVLVYVTSN-DTTVLLPPYDDATVNGAAKEPPPPYVSA 226 
N RN ++ VY +LP Y+ A V KEPPPPY+ A 

NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 

Pedant information for DKFZphutel_24ell , frame 1 

Report for DKFZphutel_24ell . 1 



[LENGTH! 226 

[MW] 25419.11 



Query: 


9 


Sbjct: 


13 


Query: 


65 


Sbjct: 


73 


Query: 


124 


Sbjct : 


132 


Query: 


184 


Sbjct: 


191 
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[pi] 


4.65 




[HOMOL] 


SWI SS PROT : MTRP_HUMAN 


GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP {KIAA0108) 


5e-40 






[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITEJ 


PKC PHOSPHO SITE 


1 


[PROSITE) 


ASN GLYCOSYLATION 


3 


IKW} 


SIGNAL PEPTIDE 49 




(KWJ 


TRANSMEMBRANE 2 




[KW] 


LOW COMPLEXITY 20. 


.80 % 



SEQ MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

SEG xxxxxxxxxxxxxxxx 

PRD ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 

MEM 

SEQ GGDFEFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAI 

SEG XXXXXXXXXXXXXXXXXX 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TVLIYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM .... 

SEQ RYINGRNSSDVLVYVTSNDTTVLLPPYDDATVNGAAKEPPPPYVSA 

SEG 

PRD eecccccccceeeeeecccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphutel_24el 1 . 1 



PS00001 
PS00001 
PS00001 
PS00005 
PS00006 
PS00006 
PS00006 
PS00007 



54->58 
187->191 
198->202 
167->170 

56->60 
128->132 
196->200 
186->195 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 



{No Pfam data available for DKFZphutel_24ell . 1) 
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DKFZphutel_24j6 



group: cell structure and motility 

DKFZphutesl_24 j6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator (CARD . 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 



strong similarity to rat CARl A.thaliana T19C21.5 

complete cDNA, complete cds, EST hits 

potential frame shift at Bp 1241 according to CARl 

but frame shift might be in CARl sequence! 

ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map="939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos. 3316, no polyadenylation signal found 



1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 
51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 
101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 
151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 
201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 
251 CACCTCAGCG AGAGCAGCAG CAGCGATAGC AGCCGCAGAA GAGCCAGCGG 
301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 
351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 
401 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 
4 51 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 
501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 
551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 
601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGGAATCAT CCTGATGATG 
651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 
701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 
7 51 CC AG TACTGC TACTGCAATC ACAATCCAAA GGGATTGGAT TGTTGTTGTT 
801 GCAGGAGAAG ACAGAAGCAA ACTAGCAAAT ATGAATGCCA CAATACGAAG 
851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 
901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 
951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 
1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 
1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 
1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 
1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 
1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 
1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 
1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 
1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 
1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 
1451 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 
1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 
1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 
1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 
1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGCAGGCGTC 
1701 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 
1751 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 
1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 
1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 
1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 
1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 
2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 
2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 
2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 
2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 
2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 
2251 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 
2301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 
2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 
2401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 
2451 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 
2501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACATGTATG 
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2 551 TGGAAAAACA AAACACTTAA CTAGAATTCT CTAATAAGGT TTATGGTTTA 
2601 GCTTAAAGAG CACCTTTGTA TTTTTATTAT CAGATGGGGC AACATATTGT 
2651 ATGAAGCATA TGTAGCACTT CACAGCATGG TTATCATGTA AGCTGCAGGT 
2701 AGAAGCAAAG CTGTAAAGTA GATTTATCAC ACAATGACTG CATACAGACT 
2751 TCAAATATGT CAATAGTTTG GTCATAGAAC CTAGAAGCCA AAAGCCACAC 
2801 AGAAGGGCAA GAATCCCAAT TTAACTCATG TTATCATCAT TAGTGATCTG 
2651 TGTTGTAGAA CATGAGGGTG TAAGCCTTCA GCCTGGCAAG TTACATGTAG 
2901 AAAGCCCACA CTTGTGAAGG TTTTGTTTTA CAAATCACTT GATTTAACAC 
2951 ACTCAGGTAG AATATTTTTA TTTTTACTGT TTTATACCCA GAAGTTATTT 
3001 CTACATTGTT CTACAGCAAG AATATTCATA AAAGTATCCC TTTCAAATGC 
3051 CTTTGAGAAG AATAGAAGAA AAAAAGTTTG TATATATTTT AAAAAATTGT 
3101 TTTAAAAGTC AGTTTGCAAC ATGTCTGTAC CAAGATGGTA CTTTGCCTTA 
3151 ACCGTTTATA TGCACTTTCA TGGAGACTGC AATACGTTGC TATGAGCACT 
3201 TTCTTTATCC TTGGAGTTTA ATCCTTTGCT TCATCTTTCT ACAGTATGAC 
3251 ATAATGATTT GCTATGTTGT AAAATCTTTG TAAAAAATTT CTATATAAAA 
3301 ATATTTTGAA AATCTTAAAA AAAAAAAAAA AAA 



BLAST Results 



Entry HS389210 from database EMBL: 
human STS SHGC-10164. 
Score = 1592, P - 1.5e-64, identities - 346/364 

Entry HS933343 from database EMBL: 
human STS WI-16551. 
Score = 1193, P » 5.7e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 



1 MTRAGDHNRQ RGCCGSLADY LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 
51 VELYGNSLLL TAVYGLVVAG SVLVLGAIIG DWVDKNARLK VAQTSLVVQN 
101 VSVILCGIIL MMVFLHKHEL LTMYHGWVLT SCYILIITIA NIANLASTAT 
151 AITIQRDWIV WAGEDRSKL ANMNATIRRI DQLTNILAPM AVGQIMTFGS 
201 PVIGCGFISG WNLVSMCVEY VLLWKVYQKT PALAVKAGLK EEETELKQLN 
251 LHKDTEPKPL EGTHLMGVKD SNIHELEHEQ EPTCASQMAE PFRTFRDGWV 
301 SYYNQPVFLA GMGLAFLYMT VLGFDCITTG YAYTQGLSGS ILSILMGASA 
351 ITGIMGTVAF TWLRRKCGLV RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 
401 SVSPFEDIRS RFIQGESITP TKIPEITTEI YMSNGSNSAN IVPETSPESV 
451 PIISVSLLFA GVIAARIGLW SFDLTVTQLL QENVIESERG IINGVQNSMN 
501 YLLDLLHFIM VILAPNPEAF GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 
551 FACGPDAKEV RKENQANTSV V 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24 j6, frame 3 

TREMBLNEW:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds., N 
- 1, Score - 1472, P = 7.2e-151 

TREMBL:AC004683_5 gene: "T19C21.5"; Arabidopsis thaliana chromosome II 
BAC T19C21 genomic sequence, complete sequence., N * 2, Score -437, P 
= 2.8e-60 

TREMBL: AF03904 6 2 gene: "R09B5.4"; Caenorhabditis elegans cosmid 
R09B5., N - 2, Score = 323, P = 1.5e-43 



>TREMBLNEW:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; 

Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds. 
Length - 4 05 
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HSPs: 



Score - 1472 {220.9 bits). Expect = 7.2e-151, P « 7.2e-151 
Identities - 288/319 (90%) , Positives - 297/319 (93%) 



Query: 


1 


MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 




MT++ D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 




Sbjct: 


1 


MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 


Query: 


61 


TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKHEL 


120 




TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHK+EL 




Sbjct: 


61 


T AV YGL WAG S VL VLGA 1 1 G DW V DKNARL KVAQT S L VVQN V S V I LC GI I LMMV FLH KNE L 


120 


Query: 


121 


LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 


180 




L MYHGWVLT CYILI ITIANIANLASTATAITIQRDWIVVVAGE+RS+LA+MNATIRRI 




Sbjct: 


121 


LNMYHGWVLTVCYILIITIANIANLASTATAITIQRDWIVVVAGENRSRLADMNATIRRI 


180 


Query: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 


240 




DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEY LLWK V YQKT P ALA VK A LK 




Sbjct: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 


240 


Query: 


241 


EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 


300 




EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+AEPFRTFRDGWV 




Sbjct : 


241 


VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 


300 


Query: 


301 


SYYNQPVFLAGMGLAF-LY 318 








SYYNQPVFL G F LY 




Sbjct: 


301 


SYYNQPVFLGWHGPGFPLY 319 





Pedant information for DKFZphutel_24 j 6, frame 3 



Report for DKFZphutel_24 j6. 3 



[LENGTH] 

[MW1 

tpU 

[HOMOLJ 

norvegicus 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE) 

[PROSITE] 

[PROSITE] 

[PFAM) 

[KW] 

(KW] 



571 

62542.72 
6.08 

TREMBL:U7 6714_1 gene: "CAR1"; product: "cell adhesion regulator"; 
cell adhesion regulator . {CAR1 ) mRNA, complete cds. le-141 
BL00341D 



MYRISTYL 15 
MITOCH_CARRIER 1 
CK2_PHOSPHO_SITE 
PROKAR_LIPOPROTEIN 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
Laminin B (Domain I 
TRANSMEMBRANE 4 
LOW COMPLEXITY 



SEQ MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL ' 

SEG 

PRD ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 

MEM MMMMMMMMMMMMM 



SEQ TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKHEL 

SEG . XXXXXXXXXXXXXXXX 

PRD ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWI VVVAGEDRSKLANMNATIRRI 

SEG XXXXXXXXXXXXXXXXXXXXX 

PRD hhcccccchhhhhhhhhhhhhhhhhhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 

MEM MMMMMMM 



SEQ DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 

SEG 

PRD hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 

MEM 



SEQ EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 

SEG 

PRD hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 

MEM 

SEQ SYYNQPVFLAGMGLAFLYMTVLGFDCITTGYAYTQGLSGSILSILMGASAITGIMGTVAF 

SEG 

PRD eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhhhh 
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MEM 

SEQ TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITp 

SEG xxx 

PRD hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 

MEM 

SEQ TKIPEITTEIYMSNGSNSANIVPETSPESVPIISVSLLFAGVIAARIGLWSFDLTVTQLL 

SEG xxxxxxxxxx 

PRD ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QENVIESERGI INGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVSFVAMGHIMYFR 

SEG 

PRD hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ FAQNTLGNKLFACGPDAKEVRKENQANTSVV 

SEG 

PRD eecccccceeeeccccchhhhhhhhcccccc 

MEM 



Prosite for DKFZphutel_24 j 6. 3 



PS00001 


100->104 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


174->178 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


434->438 


ASN~GLYCOSYLATION 


PDOC00001 


PS00001 


567->571 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


23->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


176->X79 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


294->297 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


487->490 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


36->40 


CK2 PHOSPHO~SITE 


PDOC00006 


PS0O006 


294->298 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


396->400 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


403->407 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


445->449 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00008 


12->18 


MYRISTYL 


PDOC00008 


PS00008 


65->71 


MYRISTYL 


PDOC00008 


PS0O008 


76->82 


MYRISTYL 


PDOC00008 


PS0O008 


193->199 


MYRISTYL 


PDOC00008 


PS0O008 


267->273 


MYRISTYL 


PDOC00008 


PS00008 


311->317 


MYRISTYL 


PDOC00008 


PS00008 


336->342 


MYRISTYL 


PDOC00008 


PS00008 


339->345 


MYRISTYL 


PDOC00008 


PS00008 


353->359 


MYRISTYL 


PDOC00008 


PS00008 


368->374 


MYRISTYL 


PDOC00008 


PS00008 


373->379 


MYRISTYL 


PDOC00008 


PS00008 


435->441 


MYRISTYL 


PDOC00008 


PS00008 


461->467 


MYRISTYL 


PDOC00008 


PS00008 


490->496 


MYRISTYL 


PDOC00008 


PS00008 


494->500 


MYRISTYL 


PDOC00008 


PS00013 


122->133 


PROKAR LIPOPROTEIN 


PDOC00013 


PS00215 


404->414 


MITOCH CARRIER 


PDOC00189 



Pfam for DKF2phutel_24 j 6 . 3 



HMM_NAME Laminin B (Domain IV) 

HMM *YWRlPERFLGDQvTsYGGkLe* 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNTLGNKLFACGPDAK 558 
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DKFZphutel_2h3 



group: differentiation /development 

DKFZphutel 2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site (CAAX box) and seems to be post- 
translationally modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-osteogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 



strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus: unknown 

insert length: 2033 bp 

Poly A stretch at pos. 2007, polyadenylation signal at pos . 1986 



1 GGACCGAGGC TGCACCGGCA GAGGCTGCGG GGCGGACGCG CGGGCCGGCG 
51 CAGCCATGGT GAAGATTAGC TTCCAGCCCG CCGTGGCTGG CATCAAGGGC 
101 GACAAGGCTG ACAAGGCGTC GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 
151 CGAGATCCTG CTGACGCCGG CTAGGGAGGA GCAGCCCCCA CAACATCGAT 
201 CCAAGAGGGG GAGCTCAGTG GGCGGCGTGT GCTACCTGTC GATGGGCATG 
251 GTCGTGCTGC TCATGGGCCT CGTGTTCGCC TCTGTCTACA TCTACAGATA 
301 CTTCTTTCTT GCACAGCTGG CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 
351 TGTATGAGGA CTCCCTGTCC TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 
401 GAGGATGTGA AAATCTACCT CGACGAGAAC TACGAGCGCA TCAACGTGCC 
4 51 TGTGCCCCAG TTTGGCGGCG GTGACCCTGC AGACATCATC CATGACTTCC 
501 AGCGGGGTCT GACTGCGTAC CATGATATCT CCCTGGACAA GTGCTATGTC 
551 ATCGAACTCA ACACCACCAT TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 
601 CCTCATGAAC GTGAAGAGGG GGACCTACCT GCCGCAGACG TACATCATCC 
651 AGGAGGAGAT GGTGGTCACG GAGCATGTCA GTGACAAGGA GGCCCTGGGG 
701 TCCTTCATCT ACCACCTGTG CAACGGGAAA GACACCTACC GGCTCCGGCG 
751 CCGGGCAACG CGGAGGCGGA TCAACAAGCG TGGGGCCAAG AACTGCAATG 
801 CCATCCGCCA CTTCGAGAAC ACCTTCGTGG TGGAGACGCT CATCTGCGGG 
851 GTGGTGTGAG GCCCTCCTCC CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
901 TTCTTTCCAG CTGCTCTCTG GCCCTCCTCC TTCCCCCTGC TTAGCTTGTA 
951 CTTTGGACGC GTTTCTATAG AGGTGACATG TCTCTCCATT CCTCTCCAAC 
1001 CCTGCCCACC TCCCTGTACC AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 
1051 CTCTGCTGAC CTGGGTGTGG CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 
1101 TCTGTGTCCC ACTGTCTTGA AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 
1151 CTGCACCGGC AGCCCAAGGG GAAGGACCGG TTGGGGGAGC CGGGCATGTG 
1201 AGGCCCTGGG CAAGGGGATG GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 
1251 AGAAGTATCT GCACAATTAG AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 
1301 TACACTTTCT TCACTGTCCC TATTCCTAGA CCTGGGGCTT GAGCTGAGGA 
1351 TGGGACGATG TGCCCAGGGA GGGACCCACC AGAGCACAAG AGAAGGTGGC 
1401 TACCTGGGGG TGTCCCAGGG ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 
14 51 GAGCTTGGAG TTTGGGGAGT GGGGATGAGT CCGTCAAGCA CAACTGTTCT 
1501 CTGAGTGGAA CCAAAGAAGC AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 
1551 AGGAGCACAA GCAGGGTCCC CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 
1601 GGAACGGGGC AGGCAAGGTC ACTGCTCAGT CACGTCCACG GGGGACGAGC 
1651 CGTGGGTTCT GCTGAGTAGG TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 
1701 CTGTTTTGAA AGATAACACA GAGGGAAAGG GAGAGCCACC TGGTACTTGT 
1751 CCACCCTGCC TCCTCTGTTC TGAAATTCCA TCCCCCTCAG CTTAGGGGAA 
1801 TGCACCTTTT TCCCTTTCCT TCTCACTTTT GCATGTTTTT ACTGATCATT 
1851 CGATATGCTA ACCGTTCTCA GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 
1901 GCCTTCAGTC AGTCTCTGGG GATGAAACTC TTAAATGCTT TGTATATTTT 
1951 CTCAATTAGA TCTCTTTTCA GAAGTGTCTA TAGAACAATA AAAATCTTTT 
2001 ACTTCTGAAA AAAAAAAAAA AAAAGGGCGG CCG 



BLAST Results 



Entry B64417 from database EMBL: 

CIT-HSP-2023A7.TR CIT-hsp Homo sapiens genomic clone 2023A7. 
Length =715 
Plus Strand HSPs: 
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Score - 1546 {232.0 bits), Expect = 7.8e-64, P - 7.8e-64 
Identities - 310/311 (99%) 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
multigene family of 

integral membrane proteins. 



Peptide information for frame 2 



ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 

51 RGSSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 

101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 

151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 

201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFWE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_2h3, frame 2 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N ■ 1 , Score = 573, P = 1.3e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN) . , N = 
1, Score - 560, P = 3.2e-54 

SWISSNEW:ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN)., N = 1 , 
Score « 456, P = 3.3e-43 



>SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B {TRANSMEMBRANE PROTEIN 
E3-16) . 

Length =262 

hsps: 



Score =• 573 (86.0 bits), Expect = 1.3e-55, P = 1.3e-55 
Identities - 117/264 (44%), Positives - 172/264 (65%) 



Query: 


1 


MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 


60 




MVK+SF A+A + A+K ++ ++L+ P ++P G 




Sbjct : 


1 


MVKVSFNSALA--HKEAANKEEENS QVL I L P P DAK E P ED V W PAGH KRAWCWC 


51 


Query: 


61 


LSMGMV VLLMG LV FAS V Y I Y R Y F FLAQLARDN F FRCGVL Y - E DS L S SQVRTQM-- 


112 




+ G+ +L G++ Y + Y+YF Q + CG+ Y ED LS +Q+++ 




Sbjct: 


52 


MCFGLAFMLAGVILGGAYLYKYFAFQQ GGV YFCGI KYI EDGLSLPESGAQLKSARYH 


108 


Query: 


113 


ELEEDVKI YLDEN YERI NVPVPQFGGGDPADI I HDFQRGLTAYHDI SLDKCYVI ELNTT I 


172 




+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 




Sbjct: 


109 


TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTSV 


168 


Query: 


173 


VLPPRNFWELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRR 


232 




V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 




Sbjct: 


169 


VMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFI YRLCRGKETYKLQR 228 


Query: 


233 


RATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 






+ + I KR A NC IRHFEN F +ETLIC 




Sbjct: 


229 


KEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 








Pedant information for DKFZphutel_2h3, frame 2 
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Report for DKFZphutel_2h3 . 2 



(LENGTH] 


267 




[MW] 


30253.96 




(PI] 


8.16 




[HOMOL] 


SWISSNEW : ITMB_CHICK 


INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3-16) 


le-49 






[PROSITE) 


MYRISTYL 4 




[PROSITE] 


PRENYLATION 1 




[PROSITE] 


CAMP PHOSPHO SITE 


3 


[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


4 


[PROSITE] 


ASN G LYCOS Y LAT I ON 


1 


[KWJ 


TRANSMEMBRANE 1 




[KWJ 


LOW_COMPLEXITY 15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMM 

SEQ LSMGMWLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . .xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAIRHFENTFVVETLICGVV 

SEG XX 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKF2phutel_2h3 . 2 



PS00001 


169- 


>173 


PS00004 


50 


->54 


PS00004 


187- 


>191 


PS00004 


232- 


>236 


PS00005 


49 


~>52 


PS00005 


209- 


>212 


PS00005 


227- 


>230 


PS00005 


235- 


>238 


PS00006 


30 


->34 


PS00006 


110- 


>114 


PS00006 


209- 


>213 


PS00007 


119- 


>127 


PS00008 


52 


->58 


PS00008 


71 


->77 


PS00008 


138- 


>144 


PS00008 


243- 


>249 


PS00294 


264- 


>268 



ASN GLYCOSYLATION 

CAMP PHOSPHO_SITE 

CAMP~PHOSPHO_SITE 

CAMP~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2 PHOSPHORS I TE 

CK2~PHOSPHO SITE 

TYR PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PRENYLATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00266 



(No Pfam data available for DKFZphutel_2h3 . 2) 
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DKFZphmcf l_lall 



group: transmembrane protein 

DKFZphmcf l_lall encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29A3_3~protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells. 



similarity to YDR255C and SPBC29A3.03c 
membrane regions: 1 

Summary DKFZphmcf l_lall encodes a novel 393 amino acid protein, with 
similarity to YDR255C and SPBC29A3 . 03c . 



similarity to YDR255c and SPBC29a3,03c 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 

Sequenced by DKFZ 

Locus: /map="542.7 cR from top of Chr5 linkage group" 
Insert length: 1819 bp 

Poly A stretch at pos . 1808, no polyadenylation signal found 



1 CCCGGCCCAG CCCCCGAAGA GCCGCCTCAG CCGGGGGGAG TTGCTCGGAC 
51 TCAAACGTCC AGTCCTCGTG CGACCGCGCT GGGTCGGAAG TGAGCAGGCT 
101 GAGGCCACCA TGGAGCAGTG TGCGTGCGTG GAGAGAGAGC TGGACAAGGT 
151 CCTGCAGAAG TTCCTGACCT ACGGGCAGCA CTGTGAGCGG AGCCTGGAGG 
201 AGCTGCTGCA CTACGTGGGC CAGCTGCGGG CTGAGCTGGC CAGCGCAGCC 
251 CTCCAGGGGA CCCCTCTCTC AGCCACCCTC TCTCTGGTGA TGTCACAGTG 
301 CTGCCGGAAG ATCAAAGATA CGGTGCAGAA ACTGGCTTCG GACCATAAGG 
351 AC AT TC AC AG CAGTGTATCC CGAGTGGGCA AAGCCATTGA CAGGAACTTC 
401 GACTCTGAGA TCTGTGGTGT TGTGTCAGAT GCGGTGTGGG ACGCGCGGGA 
451 ACAGCAGCAG CAGATCCTGC AGATGGCCAT CGTGGAACAC CTGTATCAGC 
501 AGGGCATGCT CAGCGTGGCC GAGGAGCTGT GCCAGGAATC AACGCTGAAT 
551 GTGGACTTGG ATTTCAAGCA GCCTTTCCTA GAGTTGAATC GAATCCTGGA 
601 AGCCCTGCAC GAACAAGACC TGGGTCCTGC GTTGGAATGG GCCGTCTCCC 
651 ACAGGCAGCG CCTGCTGGAA CTCAACAGCT CCCTGGAGTT CAAGCTGCAC 
701 CGACTGCACT TCATCCGCCT CTTGGCAGGA GGCCCCGCGA AGCAGCTGGA 
751 GGCCCTCAGC TATGCTCGGC ACTTCCAGCC CTTTGCTCGG CTGCACCAGC 
801 GGGAGATCCA GGTGATGATG GGCAGCCTGG TGTACCTGCG GCTGGGCTTG 
851 GAGAAGTCAC CCTACTGCCA CCTGCTGGAC AGCAGCCACT GGGCAGAGAT 
901 CTGTGAGACC TTTACCCGGG ACGCCTGTTC CCTGCTGGGG CTTTCTGTGG 
951 AGTCCCCCCT TAGCGTCAGC TTTGCCTCTG GCTGTGTGGC GCTGCCTGTG 
1001 TTGATGAACA TCAAGGCTGT GATTGAGCAG CGGCAGTGCA CTGGGGTCTG 
1051 GAATCACAAG GACGAGTTAC CGATTGAGAT TGAACTAGGC ATGAAGTGCT 
1101 GGTACCACTC CGTGTTCGCT TGCCCCATCC TCCGCCAGCA GACGTCAGAT 
1151 TCCAACCCTC CCATCAAGCT CATCTGTGGC CATGTTATCT CCCGAGATGC 
1201 ACTCAATAAG CTCATTAATG GAGGAAAGCT GAAGTGTCCC TACTGTCCCA 
1251 TGGAGCAGAA CCCGGCAGAT GGGAAACGCA TCATATTCTG ATTCCTACCT 
1301 GGAAGGAATT TTGTTGAAAG GGGTTTTCAC CTGTGAGCCT TGGTCTGTCT 
1351 CGGTAGGGTG GTCAACTTCA GTGGACTGTG GTTGGTTTCA GAGCGCCTGG 
1401 CTGAGGAGTT CCACTGAGGG GAGCACTGGA GCAGCCCTTT GGCAGAGGCT 
1451 GAGGAGGGAG ATGGACCAGC CCACGCCTGG CACCTGGCTC CATGGCATAA 
1501 GGAAAGGGAG ATGCTGGCCT CTGTGCTCCT GCTGTCTTTT CCTGTTTCTG 
1551 TTTGCGTTTG ACTTAGTAGC AACCGACAGA GTGGCAAGGG ATTTGGTCTT 
1601 CAGCAGTAGA CATCCTTCCA CCCCTGCCCT CAGCCAAGTC TCTTGCTGCC 
1651 ATGCCAATGC TATGTCCACC CTTGCCCCTC GGCCCAAGAG TGTCCAGCGG 
1701 TGGCCCACCT CTTCCTCCCA CTACAGCCTC AACAGTATGT ACCATCTCCC 
1751 ACTGTAAATA GTCCCAGTTA GAACGGAATG CCGTTGTTTT ATAACTTTGA 
1801 ACAAATGTAA AAAAAAAAA 



BLAST Results 



Entry HS579359 from database EMBL: 
human STS WI-6350. 
Score - 1027, P = 9.9e-40, identities - 207/209 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 1288 bp; peptide length: 393 
Category: similarity to unknown protein 



1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 
51 TPLSATLSLV MSQCCRKIKD TVQKLASDHK DIHSSVSRVG KAIDRNFDSE 
101 ICGVVSDAVW DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQESTLNVDL 
151 DFKQPFLELN RILEALHEQD LGPALEWAVS HRQRLLELNS SLEFKLHRLH 
201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 
251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVALPVLMN 
301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 
351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR I IF 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphmcf l_lall , frame 2 

TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid c29A3., N = 2, Score = 302, P - 
3.4e-42 

PIR:S67312 probable membrane protein YDR255c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 271, P = 5.3e-22 

TREMBL : CET07 Dl_2 gene: "T07D1.2"; Caenorhabditis elegans cosmid 
T07D1., N = 1, Score = 193, P = 5.6e-13 

>TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c29A3. 
Length - 398 

HSPs: 



Score - 302 (45.3 bits), Expect =• 3.4e-42, Sum P{2) = 3.4e-42 
Identities - 55/142 (38%), Positives - 89/142 (62%) 



Query: 


252 


YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 


311 




Y +LD W + F R+ C+ LG+S+ESPL + +G +ALP+L+ + ++++++ 




Sbjct: 


258 


YIDVLDLD-WKSLELLFVREFCAALGMSLESPLDIVVNAGAIALPILLKMSSIMKKKHTE 


316 


Query: 


312 


GVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVISRDALNKLING 


371 




W + ELP+EI L -+HSVF CP+ ++Q ++ NPP+ + CGHVI +++L +L 




Sbjct: 


317 


— WTSQGELPVEIFLPSSYHFHSVFTCPVSKEQATEENPPMMMSCGHVIVKESLRQLSRN 


374 


Query: 


372 


G — KLKCPYCPMEQNPADGKRI I F 393 








G + KCPYCP E AD R+ F 




Sbjct: 


375 


GSQRFKCPYCPNENVAADAIRVYF 398 




Score 


= 161 


(24.2 bits), Expect = 3.4e-42, Sum P(2) - 3.4e-42 




Identities = 51/221 (23%), Positives = 102/221 (46%) 




Query: 


22 


GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 


81 






G C L EL +++L+P++LVCK+ L K 




Sbjct: 


15 


GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA — CSEKTQQVFDDLKRTEKK 


67 


Query: 


82 


IHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQILQMAIVEHLYQQGMLSVAEELC 


141 




H+S++R GK +++ F+ ++ + + +++++++ + A+ H ++QG + +A C 




Sbjct: 


68 


FHTSLNRFGKTLEKKFKFDLEDIKLHSSFESKKRE IDTALSLHFFRQGDVELAHLFC 


124 


Query: 


142 


QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 


201 






+E+ + + F L I + + + ++DL +EWA R L SSLE+ L + 




Sbjct: 


125 


KEAGIEEPSESLHVFTLLKSIVQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 


184 


Query: 


202 


IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 242 






+ K+A+YR+ F + H +IQ M +L + 
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Sbjct: 185 VSNYL — TTKDIMAAIRYCRTNMAEFQKKHLADIQKTMIALFF 225 
Pedant information for DKFZphmcf l_lall , frame 2 



Report for DKFZphmcf l_lall . 2 



[LENGTH] 


393 




[MM] 


44414 .77 




[pi] 


6.15 




[HOMOLI 


TREMBL : S PBC2 9A3 3 gene: "SPBC29A3 . 03c" ; product: "hypothetical protein" 


S.pombe chromosome II cosmid c29A3. 


2e-39 


{ FUNCAT J 


99 unclassified prote 


ins (S. cerevisiae, YDR255c) 8e-23 


[PIRKW] 


transmembrane protein 2e-21 


[PROSITE] 


MYRISTYL 2 




[PROSITE] 


AMI DAT ION 1 




[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE) 


PROKAR LIPOPROTEIN 


1 


[ PROSITE) 


TYR PHOSPHO SITE 


3 


[PROSITE] 


PKC PHOSPHO~SITE 


1 


[PROSITE] 


ASN GLYCOSYLATION 


1 


(KW) 


TRANSMEMBRANE 1 





SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKIKDTVQKLASDHKDIHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhccccceeeechhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYQQGMLSVAEELCQESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLHFIRLLAGGPAKQLEALSYARHFQPFARLHQREIQVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 

PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IKAVIEQRQCTGVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhccccccccccccceee 

MEM MMMMMM 

SEQ SRDALNKLINGGKLKCPYCPMEQNPADGKRI I F 

PRD eehhhhhhhccccccccccccccchhhhhcccc 

MEM 



Prosite for DKFZphmcf l_lall . 2 



PS00001 


189- 


>193 


PS00005 


180- 


>183 


PS00006 


28 


->32 


PS0OOO6 


135- 


>139 


PS00006 


190- 


>194 


PS00007 


211- 


>219 


PS00007 


27 


->36 


PS00007 


244- 


>253 


PS00008 


37 


->43 


PS00008 


50 


->56 


PS00009 


387- 


>391 


PS00013 


282- 


>293 



ASN GLYCOSYLATION 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

TYR_PHOSPHO~SITE 

TYR_PHOSPHO SITE 

TYR_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

PROKAR LIPOPROTEIN 



PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00013 



(No Pfam data available for DKFZphmcf l_lall . 2 ) 
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DKFZphmcf l_lc23 



group: mammary carcinoma derived 

DKFZphmcf l_lc23. 1 encodes a novel 311 amino acid proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 



unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3077 bp 

Poly A stretch at pos . 3067, polyadenylation signal at pos. 3048 



1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 
101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 
151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 
201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 
251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 
301 GGGCCTCCCA GGGAGGACGT AGGTGCC-CCC CTGGTCACGC CCTCGCTCCT 
351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 
401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 
451 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 
501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 
551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 
601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 
651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 
701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 
7 51 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 
801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 
851 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 
901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 
951 CCCGGGCTCA GACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 

13 51 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 

14 01 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
14 51 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGGGGCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
1851 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
2451 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 
2801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 
2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAAAAGC CTCTGATTTT 
2901 TAATTTCCAT AAAATGTTAG AAGTATATAT ATACATATAT ATATTTCTTT 
2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 
3001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 4 9 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 



1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 

51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 

101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGLHAA 

151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 

201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 

251 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23, frame 1 

PIR:S49915 extensin-like protein - maize, N = 1, Score ° 215, P - 
6.1e-15 

P1R:A28996 proline-rich protein M14 precursor - mouse, N = 1, Score 
191, P - 3.8e-13 



>PIR:S49915 extensin-like protein - maize 
Length - 1, 188 

HSPs: 



Score = 215 (32.3 bits), Expect = 6.1e-15, P «* 6.1e-15 
Identities - 81/269 (30%), Positives - 115/269 (42%) 



Query: 


5 


PPPEEAFFS VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP--DPPP A 


55 




PPP S V SP P P SP PA +SS ++ PP +P PPP + 




Sbjct: 


598 


PPPPAPVASPPPPVKSPPPPTPVASPP PPAPVASSPPPMKSPPPPTPVSSPPPPEKS 


654 


Query: 


56 


PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 


115 




PPPPASP +P P K PP++P+PS + P 




Sbjct: 


655 


PPPPPPAKSTPPP-EEYPT — PPTSVKSSPPPEKSLPPPTLI PSPPPQEKPTPPSTPSKP 


711 


Query: 


116 


PTPALGPSAPQKPLRRA-LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 


174 




P+ PS P++P+ + ++SP PAP S +LA S + + PP 




Sbjct: 


712 


PSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPPPTPVSSPPALAPVSSPPSVKSSPPPA 


771 


Query: 


175 


AEPRPPQS PASTAS FI FSKGSRKLQLERPV-SPETQADLQRNLVAELRS I SEQRPPQAPK 


233 




PP +P +S +Q+ P +P++ L V + + + PP AP 




Sbjct: 


772 


PLSSPPPAPQVKSS PPPVQVSSPPPAPKSSPPLAP— VSSPPQVEKTSPPPAPL 


823 


Query : 


234 


KSPKAPPPVARKPSVGV— PPPASPSYPRAEPLTAPPTNGLP 273 






SP P + P V V PPP S P P+++PP P 




Sbjct: 


824 


SSPPLAPK-SSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKP 864 




Score 


=» 206 


(30.9 bits), Expect - 9.1e-14, P - 9.1e-14 
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Identities = 82/261 (31%) , Positives - 108/261 (41%) 

Query: 17 PEPAG-PSGSPELVSSPAASS--- SSATALQIQPPGSPDPPPAP— PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P P P P +P 
Sbjct: 410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 4 68 

Query: 70 AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+ P PV G S P V P + +V+L AP G+P P + ++P P 

SbjCt: 469 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP — PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVARKPS- 247 

+ S L P P ++ VA + PP P SP P PVA P 

Sbjct: 578 PVKSPPPPTLVASPP— PPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 635 

Query: 248 VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 

+ PPP +SP P P PP P + + 
Sbjct: 636 MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 

Score = 202 (30.3 bits), Expect - 2.9e-13, P - 2.9e-13 
Identities = 81/254 (31%), Positives - 110/254 (43%) 

Query: 16 SPEPAGPSGSPELV— SSP — AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 70 

SP PA P SP L SSP SS ++ PP +P PP P PA S P HV+ 
Sbjct: 817 SPPPA-PLSSPPLAPKSSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKPA SPPAHVS 872 

Query: 71 KLPQ KE P VGC S KGGGP P RE DVGA PL VT P S LLQMV RLRS VGAPGGAPT P ALG P S A PQ 12 6 

p+ P + PP E +P TP L ++S P +P + P + 

Sbjct: 873 SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 932 

Query: 127 KPLRRAL SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 183 

P+ + + ++SP PAP S A K+ A L P PPE + PP +P 
Sbjct: 933 PPVVVSSPPPTVKSSPPPAPVSSPPATP--KSSPPPAPVNL P— PPEVKSSPPPTP 984 

Query: 184 ASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVA 24 3 

S+ + P PE ++ V+ + PP AP SP PPPV 

SbjCt: 985 VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPVK 1042 

Query: 244 RKPS VGVPPPASPSYPRAEPLTAPP 268 

P V PPP S P P+++PP 
Sbjct: 1043 SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 

Score - 190 (28.5 bits), Expect = 7.9e-12, P - 7.9e-12 
Identities - 74/264 (28%), Positives » 111/264 (42%) 

Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 63 

ppp S PE + P P +P + T +++ PP PP P+P 

Sbjct: 639 PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 698 

Query: 64 SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 123 

P K P K PP+E V +P TP V +P PTP P 

Sbjct: 699 QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK--SSPPPAPVSSP--PPTPVSSPP 753 

Query: 124 APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 183 

A P+ S ++SP PAP S A ++K+ + + +• P PP + PP +P 
Sbjct: 754 A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP — PPAPKSSPPLAP 806 

Query: 184 ASTASFIFSKGSRKLQLERP-VSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPV 242 

S+ + LP ++P++ +V+ t + PP AP SP P 

Sbjct: 807 VSSPPQVEKTSPPPAPLSSPPLAPKSSPP--HVVVSSPPPVVKSSPPPAPVSSPPLTPKP 864 

Query: 24 3 ARKPS-VGVPP PASPSYPR AEPLTAPP 268 

A P+ V PP P++P P +EP ++PP 

SbjCt: 865 ASPPAHVSSPPEVVKPSTPPAPTTVISPPSEPKSSPP 901 

Score = 189 (28.4 bits). Expect = 1.0e-U, P « 1.0e-ll 
Identities - 86/271 (31%), Positives = 112/271 (41%) 

Query: 5 PPPEEAFFSVASPEPAGPSGSPEL-VSSP— AASSSSATALQIQPPG--SPDPPPAP 56 

PPP AS P P S P + VSSP A SS A PP PPPAP 
Sbjct: 768 PPP--APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 825 

Query: 57 PA PAPAS SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVT PS LLQMV RLRS VGAPGGAP 116 

P AP SS P V P PV S PP V +P +TP V +P 

SbjCt: 826 PPLAPKSSPPHVVVSSPP--PVVKSS PPPAPVSSPPLTPKPASPPA--HVSSPPEVV 878 

Query: 117 TPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKAC-SLAASEGL SSAQP 169 

P+ P AP + ++SP P P S V+ ++ +S + SS P 
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Sbjct: 


879 


KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPWV 


937 


Query: 


170 


-NGPPEAEPRPPQSPASTASFI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 


228 




+ PP + PP +P S+ + P PE ++ V+ + P 




Sbjct: 


938 


SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 


996 


Query : 


229 


PQAPKKSPKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 






P AP SP PPP + P V PPP S P P+++PP 




Sbjct: 


997 


PPAPMSSP — PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 




Score 


= 181 


(27.2 bits), Expect - 8.8e-ll, P - 8.8e-ll 




Identities = "73/277 (26%), Positives - 105/277 (37%) 




Query: 


3 


DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 


55 




D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 




Sbjct: 


469 


DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 


524 


Query: 


56 


PPAPAPASSAPGHVAKL PQKE PVGC S KGGG P P RE DVGAP L VT P S LLQMV RLRS VGA 


111 




PPAP + SPV++ PKP + GPP+ P P ++S 




Sbjct: 


525 


PPAPIGSPSPPPPVSVVSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 


584 


Query: 


112 


PG--GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 


166 




P +PP+ PP+ + PPS AV ++ + 




Sbjct: 


585 


PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 


64 4 


Query: 


167 


AQPNGPPEAEPRPPQS PASTAS FI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQ 


226 




PPE P PP PA + + ++ PE L+ + 




Sbjct: 


645 


VSSPPPPEKSP-PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 


702 


Query: 


227 


RPPQAPKKSPKAPP-PVARKPSVGVPPPASPSYPRAEPLTAPP 2 68 






PP P K P +P P K V PP S P P+++PP 




Sbjct: 


703 


TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 745 




Score 


- 177 


(26.6 bits), Expect - 2.6e-10, P = 2.6e-10 





Identities - 78/264 (29%), Positives - 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP— DPPPAP-- 56 

PPP +P+PA P S PE+V P+ + T I PP P PPP P 

Sbjct: 850 PPPAPVSSPPLTPKPASPPAHVSSPPEVVK-PSTPPAPTTV— ISPPSEPKSSPPPTPVS 906 

Query: 57 - P A P AP AS S A PG H V AK L PQKE PVGC S KGGG P P RE DVGA P LVT P S LLQMV RLRS VGA PGGA 115 

P P SS P + P P PP V +P P++ V +P 

Sbjct: 907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPWVSSP— PPTVKSSPPPAPVSSPPAT 959 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

p + p+ -p ++SP PPS A + S +SS P PPE 

Sbjct: 960 PKSSPPPAPVNLPPPEV KSSPPPTPVSSPPPAPK SSPPPAPMSSP-P— PPEV 1009 

Query: 17 6 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

+ PP +P S+ + P P ++ V+ + PP AP S 

Sbjct: 1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 

Query: 236 PKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 2 68 

P PPPV P V PPP S P P+++PP 
Sbjct: 1069 P — PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 

Score = 177 (26.6 bits), Expect = 2.6e-10, P = 2.6e-10 
Identities - 82/267 (30%), Positives - 110/267 (41%) 



Query: 


17 


PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 


69 




P P G P SP + PAAS+ ST + P P+P P P PPP +P 




Sbjct: 


410 


PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 


468 


Query: 


70 


AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 


128 




+P PV G S P V P + +V+L AP G+P P + ++P P 




Sbjct: 


469 


DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 


528 


Query: 


129 


LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 


188 




+ G SP P P S + +K+ AG + P PPE P PP AS 




Sbjct: 


529 


I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP — PPEKSPPPPAPVASPPP 


577 


Query: 


189 


FIFSKGSRKLQLERPV SPETQADLQRNLVAELRS ISEQRPPQA PK 


233 




+ S L P SPA+ + ++S ++ PP P 




Sbjct: 


578 


PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 


636 


Query: 


234 


KSPKAPPPVARKP SVGVPPPASPSYPRAEPLTAPPTN 270 






KSP P PV+ P PPP + S P E PPT+ 




Sbjct: 


637 


KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTS 676 




Score 


=> 170 


(25.5 bits). Expect => 1.6e-09, P - 1.6e-09 





Identities » 78/279 (27%), Positives - 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

pp S S + P +P + P SS A+ PP +P +PP P SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS--SPP-PVVVSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG— GAPTPALGP 122 

p V P PV PP +P PL ++S P +P PA 

Sbjct: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG--RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

Sbjct: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS--P--PPPVKSPPP 1046 

Query: 181 QS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP 240 

+ p S+ + P P ++ V+ + PP AP SP PP 

SbjCt: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP— PP 1103 

Query: 241 PVARKPS VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA ?S P P+ + + PP P + ++ L 
Sbjct: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score = 169 (25.4 bits), Expect « 2.1e-09, P = 2.1e-09 
Identities = 75/266 (28%), Positives - 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP— PPPEKSPPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P p P P ++ P PAP + V+ S ++S P P + 

Sbjct: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP--PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK — PS VGVPPPAS PS YPRA--EPLTAPP 268 

P +PPP + PS PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score » 168 (25.2 bits), Expect » 2.76-09, P - 2.7e-09 
Identities = 75/267 (28%), Positives = 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

Sbjct: 496 ASTPPP— SLVKLSPPQAPVGSPPPPVKTTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

SbjCt: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

Sbjct: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 179 P P--QS PASTAS FIFSKGSRKLQLERPV— SPETQADLQRNLVAELRS I SEQRPPQAPK 233 

PP+PSSKLPSPQ S + + P +P 

SbjCt: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP--SPP 721 

Query: 234 K S PKAP P P VARK PS VGVPPPAS PS YPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
SbjCt: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits), Expect « 4.6e-09, P - 4.6e-09 
Identities = 81/268 (30%), Positives = 108/268 (40%) 

Query: 5 PPPEEAF FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP-- 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 — APPAPAPASSAPGHVAKLPQKEPVGC SKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

+PP PAP +S+P + P PV K PP P ++S 

SbjCt: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 679 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P +PPLPSP P + + ++P PSS + + S SS 
Sbjct: 680 SPPPEKSLPPPTLIPSPP--PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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Query: 168 QPNGPPEAEPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADt,QRNLVAELRS I SEQR 227 
P PPSP+A+SSK P+P+ ++ + 



PP APK SP P+A P V PP 



Sbjct : 


737 


Query: 


228 


Sbjct : 


794 


Score 


- 165 


Identities 1 


Query: 


5 


Sbjct: 


517 


Query : 


61 


Sbjct: 


571 


Query: 


115 


Sbjct: 


631 


Query: 


172 


Sbjct: 


689 


Query: 


232 


Sbjct: 


740 


Score 


- 162 


Identities « 


Query: 


2 


Sbjct: 


427 


Query: 


61 


Sbjct: 


487 


Query: 


119 


Sbjct: 


537 


Query: 


175 


Sbjct: 


595 


Query: 


235 


Sbjct: 


654 


Score 


- 159 


Identities « 


Query: 


5 


Sbjct: 


916 


Query : 


60 


Sbjct: 


967 


Query: 


120 


Sbjct: 


1025 


Query: 


176 


Sbjct: 


1085 


Query: 


236 


Sbjct: 


1136 


Score 


= 143 



(24.8 bits), Expect - 6.0e-09, P - 6.0e-09 
79/264 (29%), Positives - 105/264 (39%) 



+ + P P G PS P +VS PS P GSP PP +PP PA 



-GGPPREDVGAP L VT P S LLQMVRL RS VGA PGG 114 

PP V +P + +P V AP 



P SPE + + V+ + PP A 

-PSSPEKPSP-PKEPVSSPPQTPKSSPPPA 739 



SP P PV+ P++ 



76/272 (27%), Positives =■ 99/272 (36%) 



PP+ VG+P P V+ S AP G+P+P 



-ALGPSAPQK-PLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 
+ PPKP AGSPP S A S ++PP 



Positives - 103/264 (39%) 



++ PP +P 



++ P PAP S 



S + + P P A + A ++ S PP AP S 

-SSPPPPIKSPPPP APVSSPPPAPVKPPS — LPPPAPVSS 1135 



5 bits), Expect - 1.8e-06, P - 1.8e-06 
Identities - 59/179 (32%), Positives - 77/179 (43%) 
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Query: 3 DFPPPEEAFFSVASPEP-AGPSGSPELVSSPAASSSSATA-LQIQPPGSP--DPPP A 55 

+ PPPE S P P + P +P+ PA SS ++ PP +P PPP + 

SbjCt: 970 NLPPPEVK--SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 1027 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PP PAP SS P V P PV PP + P S V+ AP + 

Sbjct: 1028 PPPPAPVSSPPPPVKSPPPPAPVSSPP PPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 116 PTPALGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 17 4 

p p + P P+ ++ P PAP S A +K SL +SS P PP 

Sbjct: 1085 PPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPPAP-VKPPSLPPPAPVSS — P — PPV 1139 

Query: 175 AEPRPPQ 181 
P PP+ 

Sbjct: 1140 VTPAPPK 1146 

Score - 133 (20.0 bits), Expect - 2.3e-05, P = 2.3e-05 
Identities - 50/132 (37%), Positives - 59/132 (44%) 

Query: 1 MADFPPPEEAFFSVASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP— DPPP 54 

M+ PPPE V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1001 MSSPPPPE VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPP 1055 

Query: 55 A P PA P A P A S S APGH VAKL PQKE P VGC S KG GG P PREDVGA P LVT PS LLQMVRL RS 108 

+PP PAP SS P V P PV PP V +P P + 

Sbjct: 1056 PVKSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP--PPPIKSPPPPAP 1113 

Query: 109 VGA PGGA PT — PALG PS AP 125 

V +P AP P+L P AP 
Sbjct: 1114 VSSPPPAPVKPPSLPPPAP 1132 

Score - 110 (16.5 bits), Expect - 8.0e-03, P = 8.0e-03 
Identities = 41/121 (33%), Positives - 49/121 (40%) 

Query: 5 PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP- -DPPP 54 

PPP S V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1060 PPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPP 1119 

Query: 55 AP P A PAP AS S A PGH VAKL PQKE PVGC S KGGG P PRE DVGAP LVT P S LLQMVRL RS 108 

AP P PAP SS P V P K+ + PP E P +L + 

Sbjct: 1120 APVKPPSLPPPAPVSSPPPVVTPAPPKKE EQSLPPPAESQPPPSFNDI ILPPIMANK 1176 

Query: 109 VGAP 112 
+ P 

Sbjct: 1177 YASP 1180 

Score » 108 (16.2 bits), Expect - 1.3e-02, P = 1.3e-02 
Identities » 46/155 (29%), Positives =• 67/155 (43%) 

Query: 114 GAPTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVR-LKACS-LAASEGLSSAQPNG 171 

G PTP GP + P + A S +P+P + P + +LS+A + P+ 
Sbjct: 408 GYPTPGGGPPSSPVPGKPAAS APMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHS 464 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQ ADLQRNLVAELRSISEQR 227 

PP + PP P S + S ++Q +P + Q + + + 

Sbjct: 465 PPADDYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP AP SP PPPV SV PPP S P P+ +PP 
Sbjct: 525 PP-APIGSPSPPPPV SVVSPPPPVKSPPPPAPVGSPP 560 

Pedant information for DKFZphmcf l_lc23, frame 1 



Report for DKFZphmcf l_lc23 . 1 

[LENGTH J 311 

[MWJ 31534.58 

[pi 1 9.48 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 38.59 % 

SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxx .... xxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 

SEQ PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPAL 
SEG xxxxxx xxxxxxxxxxx 
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PRD cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

SEQ GPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 

SEG xxxxx xxxxxxxxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ QSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 

SEG xxxxx xxxxxxxxxxxxxxx 

PRD ccccccceeeecccchhhhhccccccchhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceeeccccccccc 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 



(No Prosite data available for DKFZphmcf l_lc23 . 1) 
(No Pfam data available for DKFZphmcf l_lc23 . 1) 
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DKFZphmcfl_lel5 



group: transmembrane protein 

DKFZphmcf l_lel5 encodes a novel 454 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound. 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 
transport into cells. 

similarity to D-XYLOSE TRANSPORTER 
membrane regions : 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E12646) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1957 bp 

Poly A stretch at pos . 1947, polyadenylation signal at pos. 1929 



1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 

51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 

101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 

151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 

201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 

251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 

301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 

351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 

401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 

4 51 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 

501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 

551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 

601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 

651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 

701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 

751 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 

801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 

851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 

901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 

951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 

1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 

1051 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 

1101 CACCTGCCTG ACCGGAGTCC TGGGTGTGGG CCTGGGTGTG GAGATCAGCC 

1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 

1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 

1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 

1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 

1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 

1401 GCTGGGTGAT GCTGGGAGCC CCTACCTCAT TGGCCTGATC TCTGACCGCC 

1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 

1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 

1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 

1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 

1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 

1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 

1751 GCCCACCCCA CGAAGGGCCT GGGCCTAACC CCTTGGCCTG GCCCAGCTTC 

1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 

1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 

1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 



BLAST Results 



Entry E1264 6 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score - 3046, P - 2.2e-131, identities - 640/659 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 340 bp to 1701 bp; peptide length: 454 
Category: similarity to known protein 



1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS TGNPKSEEPE VPDQEGLQRI 

51 TGLSPGRSAL IVAVLCYINL LNYMDRFTVA VFISSYMVLA PVFGYLGDRY 

101 NRKYLMCGGI AFWSLVTLGS SFIPGEHFWL LLLTRGLVGV GEASYSTIAP 

151 TLIADLFVAD QRSRMLSIFY FAIPVGSGLG YIAGSKVKDM AGDWHWALRV 

201 TPGLGVVAVL LLFLVVREPP RGAVERHSDL PPLNPTSWWA DLRALARNLI 

251 FGLITCLTGV LGVGLGVEIS RRLRHSNPRA DPLVCATGLL GSAPFLFLSL 

301 ACARGSIVAT YIFIFIGETL LSMNWAIVAD ILLYVVIPTR RSTAEAFQIV 

351 LSHLLGDAGS PYLIGLISDR LRRNWPPSFL SEFRALQFSL MLCAFVGALG 

401 GAAFLGTAIF IEADRRRAQL HVQGLLHEAG STDDRIVVpQ RGRSTRVPVA 

451 SVLI 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4 1 gene: "C13C4.5"; Caenorhabditis elegans cosmid C13C4, 
N - 3, Score -~441, P <* 5.2e-76 

TREMBL: CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N - 2, Score - 449, P - 6.2e-69 

TREMBL : CEF09A5 1 gene: "F09A5.1"; Caenorhabditis elegans cosmid F09A5, 
N ■ 3, Score -~413, P = 9.1e-60 

TREMBL : ATF6H1 1_1 8 gene: ?F6H 11 . 180" ; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N => 3, Score » 193, P - 2.5e-24 

SWISSPROT:XYLT_LACBR D-XYLOSE-PROTON SYMPORT (D-XYLOSE TRANSPORTER) . , N 
» 1, Score = 180, P - 7.9e-ll 



>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length * 488 

HSPs: 



Score - 449 (67.4 bits), Expect = 8.2e-69, Sum P(2) - 8.2e-69 
Identities - 88/204 (43%), Positives - 125/204 (61%) 



Query: 


58 


SALIVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 


117 




+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 




Sbjct: 


29 


AGVLTQVQTYYNISDSLGGLIQTVFLISFMVFSPVCGYLGDRFNRKWIMIIGVGIWLGAV 


68 


Query: 


118 


LGSSFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGS 


177 




LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 




Sbjct: 


89 


LGSSFVPANHFWLFLVLRSFVGIGEASYSNVAPSLISDMFNGQKRSTVFMIFYFAIPVGS 


148 


Query: 


178 


G LG Y I AGS KV K DMAG DWHWALRVT PGLG VVAVL L L FL VV RE PP RGAVER HSDLPPL 


233 




GLG+I GS V + G W W +RV+ G++ ++ L L EP RGA ++ D+ 




Sbjct: 


149 


GLG F I VGS N V AT LTG H WQWG I RV S A I AGL I VM I A LV LFT Y E PERGAAD KAMGES KDVVVT 


208 


Query: 


234 


NPTSWWADLRALARNLI FGLITCLTG 259 






T++ DL L + L+ C G 




Sbjct: 


209 


TNTTYLEDLVILLKTPT--LVACTWG 232 




Score 


- 267 


(40.1 bits), Expect - 8.2e-69, Sum P(2) - 8.2e-69 




Identities = 74/212 (34%), Positives - 113/212 (53%) 




Query: 


249 


LI FGLITCLTG VLGVGLGVE I SRRL RHSNPRADPLVCATGLLGSAPFLFLSL 


300 




L FG IT G++GV G +S+ L R RA PLV G L +APFL + + 




Sbjct: 


277 


LYFGAITTAGGLIGVI FG SMLS KWLV AGWG P FRRLQTDRAQ PLVAGGGALLAA PFLLIGM 


336 


Query: 


301 


ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 


360 
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S+V YI IF G T + NW + D+L V+ P RRSTA ++ +++SHL GDA 
SbjCt: 337 IFGDKSLVLLYIMIFFGITFMCFNWGLNIDMLTTVIHPNRRSTAFSYFVLVSHLFGDASG 396 

Query: 361 PYLIGLISDRLRRN-- WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAI FI EADRR-- 416 

PYLIGLISD +R +P ++ +L + C + L + +++ + +DR+ 

SbjCt: 397 PYLIGLISDAIRHGSTYPKD QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 453 

Query: 417 RAQLHVQGLLHEA — GSTD — DRIVVPQRGRSTRV 4 47 

RA++ + L + STD +RI + S+R+ 
SbjCt: 454 RAEMGLDDLQSKPIRTSTDSLERIGINDDVASSRL 4 88 

Score = 70 (10.5 bits), Expect = 5.9e-24, Sura P{2) = 5.9e-24 
Identities - 25/89 (28%) , Positives - 41/89 (46%) 

Query: 62 VAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT--LG 119 

V L +NLLNY+DR+TVA ++ + LG + L+ +S V LG 

SbjCt: 11 VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI — SFMVFSPVCGYLG 68 

Query: 120 SSFI PGEHFWLLLLTRGLVGVGEASYSTIAP 150 

F W++++ G + +G S+ P 

Sbjct: 69 DRF NRKWIMI IGVG- IWLGAVLGSSFVP 95 



Pedant information for DKFZphmcf l_lel5, frame 1 



Report for DKFZphmcf l_le 15 . 1 



[LENGTH) 4 54 

[MW] 49013.35 

[pi] 7.66 

[HOMOL] TREMBL:CEC13C4_1 gene: "C13C4,5"; Caenorhabditis elegans cosmid C13C4 2e-51 

[BLOCKS] BL01022D 

[PROSITE] MYRISTYL 11 

[PROSITE] CAMP PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO SITE 3 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] GLYCOSAM I NOGLYCAN 1 

(PROSITE] PKC PHOSPHO SITE 4 

[KW] TRANSMEMBRANE 8 

[KW] LOW_COMPLEXITY 15.42 % 



SEQ MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 

SEG xxxxxxxxxxxxxxxx 

PRD cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 

SEG 

PRD hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SFIPGEHFWLLLLTRG LVG VG EAS Y ST I A PT LI A DL FVA DQRS RMLS I FY FA I PVGS GLG 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ YIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLWREPPRGAVERHSDLPPLNPTSWWA 

SEG XXXXXXXXXXXXX 

PRD eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 

MEM MMMMMMMMM 

SEQ DLRALARNLIFGLITCLTGVLGVGLGVEISRRLRHSNPRADPLVCATGLLGSAPFLFLSL 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ACARGSIVATYI FIFIGETLLSMNWAIVADILLYVVI PTRRSTAEAFQIVLSHLLGDAGS 

SEG 

PRD ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchhhhhhcccccccccccc 

MEM MMMM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMM 

SEQ PYLIGLISDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRRRAQL 

SEG XXXXXXXXXXXXX 

PRD ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhh 

MEM MMMMMMMM MM 

SEQ HVQGLLHEAGSTDDRIVVPQRGRSTRVPVASVLI 
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SEG 

PRD hhhhhhhhccccceeeeeeccccccceeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphmcf l_le!5 . 1 



PSOO002 


177- 


>181 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


340- 


>344 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


270- 


>273 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


. 339- 


>342 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


368- 


>371 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00005 


444- 


>447 


PKC PHOSPHO" 


"site 


PDOC0O005 


PS00006 


11 


->15 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


342- 


>346 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


431- 


>435 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


26 


i->32 


MYRISTYL 




PDOC00008 


psooooe 


32 


:->38 


MYRISTYL 




PDOC00008 


PS00008 


52 


:->58 


MYRISTYL 




PDOC00008 


psooooe 


139- 


>145 


MYRISTYL 




PDOC00008 


PS00008 


176- 


>182 


MYRISTYL 




PDOC00008 


psooooa 


252- 


>258 


MYRISTYL 




PDOC00008 


PS00008 


262- 


>268 


MYRISTYL 




PDOC00008 


PS00008 


266- 


>272 


MYRISTYL 




PDOC00008 


psooooe 


288- 


>294 


MYRISTYL 




PDOC00008 


PS00008 


305- 


>311 


MYRISTYL 




PDOC00008 


PS00008 


397- 


>403 


MYRISTYL 




PDOC00008 


PS00013 


292- 


>303 


PROKAR_LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphmcf 1_1 el 5. 1 ) 
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DKFZphmcfl_lgl3 



group; mammary carcinoma derived 

DKFZphmcfl lgl3 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0543 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes. 

similarity to KIAA0766 

commplete cDNA, complete cds, few EST hits 

on genomic level encoded by AC005020, no splicing, genomic? 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2210 bp 

Poly A stretch at pos. 2200, polyadenylation signal at pos. 2176 

1 GAAACCTGAT CTCATAAAAC CTAGGTCACA AAGGACAGCC CTGCAAAACA 
51 GACCCTATTT GGATCAAGTG AGCCAGTTCC TGGAACCTGA ATAATGACTC 

101 CTGAATCAAG GGATACTACA GATTTGTCTC CAGGGGGTAC CCAGGAGATG 

151 GAAGGCATCG TGATAGTGAA GGTGGAGGAG GAAGATGAAG AAGACCATTT 

201 TCAAAAGGAA AGAAACAAAG TAGAGTCATC GCCACAAGTT CTCAGTCGCT 

251 CTACAACTAT GAATGAGAGA GCCTTATTGT CATCGTATTT AGTTGCATAT 

301 AGAGTGGCAA AAGAGAAAAT GGCTCACACA GCGGCTGAAA AAATTATCCT 

351 TCCAGCATGT ATGGACATGG TACGGACAAT TTTTGATGAC AAATCAGCTG 

401 ATAAACTAAG AACTATACCT CTTAGTGATA ATACAATATC TCGTCGAATC 

451 TGTACGATTG CAAAACATTT GGAAGCAATG CTTATTACAC GGCTGCAGTC 

501 CGGTATAGAC TTTGCAATCC AACTCGATGA GAGCACTGAT ATTGCAAGTT 

551 GTCCCACACT CTTGGTTTAT GTCAGATATG TGTGGCAAGA TGATTTTGTA 

601 GAGGATCTCT TATGTTGTTT AAATTTAAAT TCACATATAA CTGGATTAGA 

651 TTTATTTACT GAATTAGAAA ACTGCCTTCT TGGTCAGTAT AAATTAAACT 

701 GGAAACATTG TAAAGGAATT TCAAGTGATG GAACAGCAAA TATGACCGGA 

751 AAACACAGCA GACTTACTGA AAAATTGTTA GAAGCAACCC ACAACAATGC 

801 TGTTTGGAAT CACTGTTTTA TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 

651 TTTCACCAAG TCTGATGGAT GTATTGAAAA ATGCAGTGAA AACTGTTAAT 

901 TTTATTAAAG GAAGCTCACT GAATAGCCGA CTTCTCGAAA TATTTTGTTC 

951 AGAGATTGGA GTGAACCACA CCCACTTATT GTTTCATACA GAAGTTCGTT 
1001 GGCTTTCTCA AGGAAAAGTA TTGAGCAGAG TATATGAACT CAGGAACGAG 
1051 ATTTACATTT TTCTCGTTGA AAAGCAATCT CATTTGGCAA ATATTTTTGA 
1101 AGACGACATT TGGGTAACAA AATTGGCATA TTTAAGTGAT ATTTTTGGCA 
1151 TTCTTAATGA ATTAAGCCTG AAAATGCAGG GGAAAAACAA TGATATATTT 
1201 CAGTATCTTG AACATATTCT AGGATTCCAA AAGACGTTAT TATTGTGGCA 
1251 AGCAAGACTT AAAAGTAACC GCCCTAGCTA CTATATGTTT CCAACATTAT 
1301 TGCAACACAT CGAAGAGAAC ATTATTAATG AAGACTGCTT AAAAGAAATA 
1351 AAATTAGAGA TATTGTTGCA TCTCACTTCT TTGTCTCAAA CTTTTAATTA 
1401 TTACTTTCCG GAAGAGAAAT TTGAATCATT AAAGGAAAAT ATTTGGATGA 
14 51 AAGATCCATT TGCTTTTCAA AACCCAGAAT CAATAATTGA GTTAAACTTG 
1501 GAGCCTGAAG AAGAGAATGA ATTATTGCAG CTCAGTTCAT CATTCACACT 
1551 AAAGAATTAT TATAAGATAT TAAGTTTATC AGCATTTTGG ATTAAGATTA 
1601 AAGATGACTT TCCACTGCTA AGTAGGAAGA GTATATTGCT GTTACTACCA 
1651 TTCACAACTA CATATTTGTG TGAACTAGGA TTTTCAATCT TGACACGGTT 
1701 AAAAACAAAG AAGAGAAATA GGCTCAATAG TGCACCAGAT ATGCGGGTAG 
1751 CATTATCTTC ATGTGTTCCT GACTGGAAGG AACTTATGAA CAGACAAGCA 
1801 CACCCATCAC ATTAAATACA AACTTTACAA AATTCTGTGT ATAGCCAGGT 
1851 GTGGTGGCTT ACGCCTGTAA TCCCAGCAGT GGGAGACCGA GGTGGGCAGA 
1901 TCACTTGAGT TCAAGACCAG CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 
1951 CTAAAAATAG AAACCTTAGC CAGGCGTGGT GGCACATGCC TGCAGTCCCA 
2001 GTTACTTGGG TGCCTGAGGC AGGAGAATCT CTTAAACCAG GAAGGCAGAG 
2051 ATTGCAGTGA GCTGAGATAA TCCCACTGCA TTCCAGCCTG GGCAACAGCG 
2101 TGAGACTTCA TCTCAAAAAA AAAAAATTGT ATTTGTACTT TTAAAGGGAT 
2151 TTTGCAGTAT GTTGTAGTTA AACGTTAATA AAATTATATT TGTAATTAGG 
2201 AAAAAAAAAA 



BLAST Results 



Entry AC005020 from database EMBL: 

Homo sapiens clone GS2 59H13; HTGS phase 1, 4 unordered pieces. 
Score = 9110, P = 0.0e+00, identities - 1822/1822 
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Medline entries 

No Medline entry 

Peptide information for frame 1 



ORF from 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 



1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDHFQKER NKVESSPQVL 

51 SRSTTMNERA LLSSYLVAYR VAKEKMAHTA AEKIILPACM DMVRTIFDDK 

101 SADKLRTI PL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 

151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 

201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 

251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 

301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANIFEDDIW VTKLAYLSDI 

351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 

401 TLLQHIEENI INEDCLKEIK LEILLHLTSL SQTFNYYFPE EKFESLKENI 

4 51 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 

501 KIKDDFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 

551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877_3 from database TREMBLNEW: 

gene: "WUGSC : H_DJ0751H13 , 2"; product: "KIAA054 3 protein"; Homo sapiens 

PAC clone DJ0751H13 from 7q35-qter, complete sequence. 

Score - 86, P - 4.4e-03, identities - 46/179, positives - 78/179 

Entry MD36211_1 from database TREMBL: 

product: "Hermes transposase"; Musca domestica Hermes transposase 

gene, complete, cds. 

.Score " 105, P - 3.0e-02, identities - 101/465, positives - 202/465 



Alert BLASTP hits for DKFZphmcf l_lgl3, frame 1 

TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N » 1, Score = 300, P 
- l.le-23 

>TREMBL:AB018309_1 gene: "KIAA07 66"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length = 607 



l.le-23, P 



CM+ ++R + + L+ + LS + +RI +1 ++L 



LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
YLLVFI RGVGPELEVQEDLLTI I NLTHHFSVGALMSAI LES — LQTAGLSLQR 240 



G+++ T M G++S L + E + WN H F+H E L S ++ + 



WL +GK L ++ LR E+ 



+ F D W+ 



Score 


- 300 


Identities 1 


Query: 


89 


Sbjct: 


124 


Query: 


148 


Sbjct: 


183 


Query: 


206 


Sbjct: 


241 


Query: 


262 


Sbjct: 


299 


Query: 


321 


Sbjct: 


359 
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Query: 381 TLLLWQARLKSNRPSYYMFPTLLQHI EE NI INEDCLKEIKLEILLHLTSLSQTFNY 436 

It L+Q ++ + FP L + ++E N +E + ++++ L + F 

Sbjct: 418 KLNLFQRHIEEKNLTD — FPALREVVDELKQQNKEDEKIFDPDRYQMVI — CRLQKEFER 473 

Query: 437 YFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILSL 495 

+F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I L 

Sbjct: 474 HFKDLRF — IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKDL 525 

Query: 496 SAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDMR 551 

F+ + + +p++ + + F + +CE FS LTR + L R 

Sbjct: 526 GQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQAL.fr 585 



Query: 552 VALSSCVPDWKELMNRQAHPSH 573 

VA + P W +L+ R+ + S+ 
Sbjct: 586 VATTEMEPGWDDLV-RERNESN 606 

Score - 290 (43.5 bits>, Expect « 1.5e-22, P - 1.5e-22 
Identities = 120/485 (24%», Positives - 22B/485 (47%) 



Query: 89 CMD-MVRTIFDDKSADKLRTI PLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 

CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 

Sbjct: 124 CMEVLLREVLPEH-VSVLQGVDLSPDITRQRILSIDRNLRNQLFNRARDFKAYSLALDDQ 182 

Query: 148 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
Sbjct: 183 AFVAYENYLLVFIRGVGPELEVQEDLLTIINLTHHFSVGALMSAILES — LQTAGLSLQR 240 

Query: 206 C KG I S S DGTANMTG KH S RLT E KL LEATHNNAVWNHC F I H REA L V S KE I S PS LMOV- LKN A 264 

G+++ T M G++S L + E + WN IH + E+ S DV + 
Sbjct: 241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWN— VIHYSGFLHLELLSSY-DVDVNQI 297 

Query: 265 VKTVN FIKGSSLNSRLLEI FCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 319 

+ T++ IK + + +E H + + WL +GK L ++ LR E 

Sbjct: 298 INTISEWIVLIKTRGVRRPEFQTLLTESESEHGERVNGRCLNNWLRRGKTLKLIFSLRKE 357 

Query: 320 IYIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLKMQGKNNDIFQYLEHILGFQ 379 

+ FLV + + F D W+ +L DI L ELS +++ +HI F+ 

SbjCt: 358 MEAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFE 416 

Query: 380 KTLLLWQARLKSNRPSYYMFPTLLQHIEENI INEDCLKEIKL EILLHLTSLSQTFN 435 

L L+Q ++ + FP L + + + E + + ++K+ ++L + F 

Sbjct: 417 VKLNLFQRHIEEKNLTD--FPALREVVDE--LKQQNKEDEKIFDPDRYQMVICRLQKEFE 472 

Query: 4 36 YYFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILS 494 

+ F + +F +K+++ +- +PF F+ + I + +E L +L ++ L N Y+I 

SbjCt: 473 RHFKDLRF — IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKD 524 

Query: 4 95 LSAFWIKIK-DDFPLLSRKSI LLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDM 550 

L F+ + + +P++ + + F + +CE FS LTR + L 
Sbjct: 525 LGQFYAGLSAESYPI IKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALF 584 

Query: 551 RVALSSCVPDWKELMNRQAHPSH 573 

RVA + P W +L+ R+ + S+ 
Sbjct: 585 RVATTEMEPGWDDLV-RERNESN 606 



Pedant information for DKFZphmcf l_lgl3, frame 1 



Report for DKFZphmcf l_lgl3 . 1 



[LENGTH] 573 

[MW] 66276.85 

[pi] 5.82 

IHOMOL] TREMBL: AB018309_1 gene: "KIAA0766"; product: "KIAA07 66 protein"; Homo sapiens 

mRNA for KIAA0766 protein, complete cds. le-18 

[PROSITE] MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 10 

t PROSITE] TYR_PHOSPHO SITE 1 

[PROSITE] PKC_PHOSPHO~SITE 9 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 8 . 90 % 



SEQ MTPESRDTTDLSPGGTQEHEGIVIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 

SEG xxxxxxx 

PRD ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 

SEQ LLSSYLVAYRVAKEKMAHTAAEKI ILPACMDHVRTIFDDKSADKLRTIPLSDNTISRRIC 
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SEG 

pro hhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhcccccceeeeecccchhhhhhh 

SEQ TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRlVWQDDFVEDLLCCLNLNS 

SEG 

PRD hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 

SEQ HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 

SEG 

PRD eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 

SEQ CFIHREALVSKEISPSLMDVLKNAVKTVNFIKGSSLNSRLLEI FCSEIGVNHTHLLFHTE 

SEG 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhhh 

SEQ VRWLSQGKVLSRVYELRHEI YI FLVEKQSHLANIFEDDIWVTKLAYLSDIFGIIiNELSLK 

SEG ■ 

PRD cccccccchhhhhhhhhhhhhhhhhhhhchhhhhcccceeehhhhhhhhhhhhhhhhhhh 

SEQ MQGKNNDI FQYLEHILGFQKTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIK 

SEG xxxxx 

PRD hhccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhcchhhhhh 

SEQ LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

SEG xxxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SEQ SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKK 

SEG xxx..... xxxxxxxxxxx 

PRD hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

SEQ RNRLNSAPDMRVALSSCVPDWKELMNRQAHPSH 

SEG 

PRD hcccccccccceeeccccccchhhhhhhccccc 



Prosite for DKFZphmcf l_lgl3 . 1 

PS00001 216->220 AS N_G LYCOS Y LAT I ON PDOC00001 

PS00001 291->295 ASN_GLYCOSYLATION PDOC00001 

PS00005 116->119 PKC_PHOSPHO_SITE PDOC00005 

PS00005 218->221 PKC_PHOSPHO SITE PDOC00005 

PS00005 225->228 PKC_PHOSPHO~SITE PDOC00005 

PS00005 358->361 PKC_PHOSPHO_SITE PDOC00005 

PS00005 391->394 PKC_PHOSPHO SITE PDOC00005 

PS00005 445->448 PKC_PHOSPHO~SITE PDOC00005 

PS00005 485->488 PKC_PHOSPHO""SITE PDOC00005 

PS00005 510->513 PKC_PHOSPHO~SITE PDOC00005 

PS00005 538->541 PKC_PHOSPHO~SITE PDOC00005 

PSOO006 55->59 CK2_PHOSPHO_SITE PDOC00006 

PS00006 79->83 CK2_PHOSPHO SITE PDOC00006 

PS00006 95->99 CK2_PHOSPHO~SITE PDOC00006 

PS00006 136->140 CK2_PHOSPHO~SITE PDOC00006 

PS00006 183->187 CK2_PHOSPHO~SITE PDOC00006 

PS00006 189->193 CK2 PHOSPHO_SITE PDOC00006 

PS00006 256->260 CK2~PHOSPHO_SITE PDOC00006 

PS00006 445->449 CK2_PHOSPHO_SITE PDOC00006 

PS00006 463->467 CK2 PHOSPHO_SITE PDOC00006 

PS00006 54 6->550 CK2~PHOSPHO_SITE PDOC00006 

PS00007 364->372 TYR_PHOSPHO_SITE PDOC00007 

PS00008 I37->143 MYRISTYL PDOC00008 

PS00008 273->279 MYRISTYL PDOC00008 

PS00008 289->295 MYRISTYL PDOC00O08 



(No Pfam data available for DKFZphmcf l_lg!3. 1 ) 
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DKFZphtes3_14g5 



group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . It 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide), but not the zinc finger motif and and nuclear localization signals of 
lyar . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus : unknown 

insert length: 1503 bp 

Poly A stretch at pos. 1467, polyadenylation signal at pos. 1440 



1 CCCAGAGGTC CGACCTGGGA GGCTGGGGCT CAGAGAGCAA TGTTTGCTGT 

51 CTTCCATTGG AGTGACTGAA TTTCTACATG ACGGCTTTTT GACAAGACTT 

101 AAAACCTGTC TTGGATAGAG AATATTTAGC CATTTACCTA AAAATGGTAT 

151 TTTTTACATG CAATGCATGT GGTGAATCAG TGAAGAAAAT ACAAGTGGAA 

201 AAGCATGTGT CTGTTTGCAG AAACTGTGAA TGCCTTTCTT GCATTGACTG 

251 CGGTAAAGAT TTCTGGGGCG ATGACTATAA AAACCACGTG AAATGCATAA 

301 GTGAAGATCA GAAGTATGGT GGCAAAGGCT ATGAAGGTAA AACCCACAAA 

351 GGCGACATCA AACAGCAGGC GTGGATTCAG AAAATTAGTG AATTAATAAA 

401 GAGACCCAAT GTCAGCCCCA AAGTGAGAGA ACTTTTAGAG CAAATTAGTG 

451 CTTTTGACAA CGTTCCCAGG AAAAAGGCAA AATTTCAGAA TTGGATGAAG 

501 AACAGTTTAA AAGTTCATAA TGAATCCATT CTGGACCAGG TGTGGAATAT 

551 CTTTTCTGAA GCTTCCAACA GCGAACCAGT CAATAAGGAA CAGGATCAAC 

601 GGCCACTCCA CCCAGTGGCA AATCCACATG CAGAAATCTC CACCAAGGTT 

651 CCAGCCTCCA AAGTGAAAGA CGCCGTGGAA CAGCAAGGGG AGGTGAAGAA 

701 GAATAAAAGA GAAAGAAAGG AAGAACGGCA GAAGAAAAGG AAAAGAGAAA 

751 AGAAAGAACT AAAGTTAGAA AACCACCAGG AAAACTCAAG GAATCAGAAG 

801 CCTAAGAAGC GCAAAAAGGG ACAGGAGGCT GACCTTGAGG CTGGTGGGGA 

851 GGAAGTCCCT GAGGCCAATG GCTCTGCAGG GAAGAGGAGC AAGAAGAAGA 

901 AGCAGCGCAA GGACAGCGCC AGTGAGGAAG AGGCACGCGT GGGCGCAGGG 

951 AAGAGGAAGC GGAGGCACTC GGAAGTTGAA ACAGATTCTA AGAAGAAAAA 

1001 GATGAAGCTC CCAGAGCATC CTGAGGGCGG AGAACCAGAA GACGATGAGG 

1051 CTCCTGCAAA AGGTAAATTC AACTGGAAGG GAACTATTAA AGCAATTCTG 

1101 AAACAGGCCC CAGACAATGA AATAACCATC AAAAAGCTAA GGAAAAAGGT 

1151 TTTAGCTCAG TACTACACAG TGACAGATGA GCATCACAGA TCCGAAGAGG 

1201 AACTCCTGGT CATCTTTAAC AAGAAAATCA GCAAGAACCC TACCTTTAAG 

1251 TTATTAAAGG ACAAAGTCAA GCTTGTGAAA TGAACATTTG TGTATTTAAA 

1301 AATTGAATCC ATTCTGCTGA CTTCTTCCTT TCACTGCTGT TTATAAAATG 

1351 TGTAATGAAT TCTAACAACT CAAATTTTGC TTTTTGAAGC TGTATTTTTA 

1401 AGTTAAGAAA ATATATTTTT GGTATAACTT T TAT GAG AAA AATAAAATAT 

1451 ATTCTGGTCC AAACTTCAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

1501 AAA 



BLAST Results 



No BLAST result 



Medline entries 



932594 60: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 
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Peptide information for frame 3 



ORF from 144 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP GTP A (60-68) 



1 MVFFTCNACG ESVKKIQVEK HVSVCRNCEC LSCIDCGKDF WGDDYKNHVK 
51 CISEDQKYGG KGYEGKTHKG DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 
101 ISAFDNVPRK KAKFQNWMKN SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 
151 DQRPLHPVAN PHAEISTKVP ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 
201 REKKELKLEM HQENSRNQKP KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 
251 KKKQRKDSAS EEEARVGAGK RKRRHSEVET OSKKKKMKLP EHPEGGEPED 
301 DEAPAKGKFN WKGTIKAILK QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 
351 EEELLVIFNK KISKNPTFKL LKDKVKLVK 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_14g5, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N - 
1, Score - 1410, P - 2.7e-144 

SWISSPROT:YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10.8 IN 
CHROMOSOME III., N = 1, Score = 381, P « 2.9e-35 

TREMBL:AC003058_18 gene: "F27F23 . 18" ; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N - 3, Score = 139, P » 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae), N - 1, Score » 164, P - 1.4e-ll 

>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length - 388 



Score = 1410 {211.6 bits), Expect - 2.7e-144, P - 2.7e-144 
identities - 27S/388 (70%), Positives - 317/388 (81%) 



MVFFTCN ACGES VKK I QVEK VS CRNCECLSCIDCGKDFWGDDYK+HVKCISE QKYGG 



KG YE KTHKGD KQQAWIQKI+ELIK+PNVSPKVRELL+QISAFDNVP K KAKFQNWMKN 



SLKVH++S+L+QVW+I FSEAS+SE ++Q Q P H A PHAE+ TKVP++K 



+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE+ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


180 


Sbjct: 


177 


Query: 


240 


Sbjct: 


237 


Query: 


288 


Sbjct: 


297 


Query: 


348 


Sbjct: 


357 



+ AGKRKR +HS E+ KKKKM 



KLPE PE GE +D EAP+KGKFNWKGTIKA+LKQAPDNEI ++KKL+KKV+AQY+ V ++ 



EEELL IFN+KIS+NPTFK+LKD+VKL+K 



Pedant information for DKFZphtes3_14g5, frame 3 
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Report for DKFZphtes3_14g5 . 3 



[LENGTH] 379 

[MM] 43634.03 

[pi] 9.59 

[HOMOL] PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YCR087c-a) 2e-ll 

[BLOCKS] BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 

[PROSITE] ATP_GTP A 1 

[KWl All_Alpha 

|KW) LOW_COMPLEXITY 18.73 % 



SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 

SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

SEQ KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESIL DQVWN IFSEASNSEPVNK EQDQRPLH P VAN PHAE ISTKV PASKVKDAV EQ 

SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 

SEG . .xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEHHRSEEELLVI FNK 

SEG xxxxx 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 



Prosite for DKFZphtes3_14g5 . 3 
PS00017 60->68 ATP_GTP_A PDOC00017 



(No Pfam data available for DKFZphtes3_14g5.3) 
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DKFZphtes3_14h21 



group: nucleic acid management 

DKFZphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to raus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large f?tiily of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2200 bp 

Poly A stretch at pos. 2166, polyadenylation signal at pos. 2140 



1 CAACGACGTC GGACGCGCCC CTTCTTGGAA CAATGTCCCA CCACGGAGGA 
51 GCTCCCAAGG CCTCTACGTG GGTCGTTGCT AGTCGGCGAA GCTCGACAGT 
101 GTCCCGAGCG CCAGAGAGGA GGCCGGCGGA GGAGTTGAAT CGAACAGGTC 
151 CTGAGGGATA TAGTGTCGGC AGAGGTGGTC GCTGGAGAGG CACCTCTAGG 
201 CCCCCGGAGG CCGTGGCCGC TGGTCACGAG GAACTGCCGC TGTGTTTTGC 
251 TTTGAAGAGC CACTTTGTTG GCGCGGTAAT CGGTCGTGGT GGGTCAAAAA 
301 TAAAGAATAT ACAAAGTACA ACAAACACCA CAATCCAAAT AATACAAGAA 
351 CAACCAGAAT CATTAGTCAA AATTTTTGGC AGCAAGGCAA TGCAAACGAA 
401 AGCAAAAGCA GTGATAGACA ATTTTGTTAA AAAGCTAGAA GAAAATTACA 
451 ATTCAGAATG CGGAATTGAT ACTGCATTCC AACCTTCTGT TGGAAAAGAT 
501 GGAAGCACAG ATAACAATGT TGTTGCAGGA GATCGGCCAT TGATAGATTG 
551 GGATCAAATT AGAGAGGAAG GTTTG.\AATG GCAAAAAACA AAGTGGGCAG 
601 ATTTACCACC AATTAAGAAA AACTTTTATA AAGAGTCCAC TGCCACAAGT 
651 GCCATGTCAA AAGTAGAAGC AGATAGTTGG AGGAAAGAAA ATTTTAATAT 
701 AACGTGGGAT GACTTGAAGG ATGGGGAGAA ACGACCTATC CCCAATCCTA 
751 CCTGCACATT TGATGACGCC TTTCAATGTT ATCCTGAGGT TATGGAAAAC 
801 ATTAAAAAGG CAGGTTTTCA AAAGCCAACA CCTATTCAGT CACAGGCATG 
851 GCCCATTGTG TTGCAAGGAA TAGATCTTAT AGGAGTAGCC CAGACTGGAA 
901 CAGGAAAGAC ATTGTGTTAT TTAATGCCTG GATTTATTCA TCTGGTCCTT 
951 CAACCCAGCC TTAAAGGTCA AAGGAATAGA CCCGGCATGT TAGTTCTAAC 
1001 TCCCACTCGG GAATTAGCAC TTCAAGTAGA AGGAGAATGT TGCAAATATT 
1051 CATATAAAGG GCTTCGGAGT GTTTGTGTAT ATGGTGGTGG AAATAGAGAT 
1101 GAACAAATAG AAGAGCTTAA AAAAGGTGTA GATATCATAA TTGCAACTCC 
1151 CGGAAGATTG AATGATCTGC AAATGAGTAA CTTCGTCAAT CTGAAGAATA 
1201 TAACCTACTT GGTTTTAGAT GAAGCAGACA AGATGTTGGA CATGGGATTT 
1251 GAACCCCAGA TAATGAAGAT TTTGTTAGAT GTGCGCCCAG ATAGGCAGAC 
1301 AGTTATGACC AGTGCTACAT GGCCTCATTC AGTTCATCGC CTCGCACAAT 
1351 CTTATTTGAA AGAACCAATG ATTGTCTATG TTGGTACATT GGATCTAGTT 
1401 GCTGTAAGTT CAGTGAAGCA AAATATAATT GTAACCACCG AGGAAGAGAA 
14 51 ATGGAGTCAC ATGCAAACTT TTCTACAGAG TATGTCATCC ACAGACAAAG 
1501 TCATTGTCTT CGTTTCTCGA AAAGCTGTTG CGGATCACTT ATCAAGTGAC 
1551 CTAATACTTG GAAATATATC AGTAGAGTCT CTGCATGGAG ATAGAGAACA 
1601 GAGAGATCGG GAGAAAGCAT TAGAGAACTT TAAAACAGGC AAAGTGAGAA 
1651 TACTAATTGC AACTGATCTA GCCTCTAGAG GACTTGATGT CCATGACGTT 
1701 ACACATGTCT ATAATTTTGA CTTTCCACGG AATATTGAAG AATACGTACA 
1751 CCGAATAGGG CGCACGGGAA GAGCAGGGAG GACTGGTGTT TCCATTACAA 
1801 CTTTGACTAG AAATGATTGG AGGGTTGCCT CTGAATTGAT TAATATTCTG 
1851 GAAAGAGCAA ATCAGAGTAT TCCAGAGGAG CTTGTATCAA TGGCTGAGAG 
1901 GTTTGAGGCA CATCAACGGA AAAGGGAAAT GGAAAGAAAA ATGGAAAGAC 
1951 CTCAAGGAAG GCCCAAGAAG TTTCATTAAT GTCTTCTGTA CTAGTGGGGT 
2001 AGAGAATTCA AGATTTTTTA GAAATATAGT AAGACAGAAG TATTGGACAT 
2051 GTTGGCAGTA TGAAGAGACC GGACTGATTT GACTGATTCT TAAAATAATA 
2101 GTGTTTGAAA ATATAGAATC CAGTGTTTTA TACTTTCTTT AATAAAAATA 
2151 GAAGTATTTA AACTTGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP A (286-294) 
DEAD_ATP_HELICASE {394-403) 



1 MSHHGGAPKA STWVVASRRS STVSRAPERR PAEELNRTGP EGYSVGRGGR 
51 WRGTSRPPEA VAAGHEELPL CFALKSHFVG AVIGRGGSKI KNIQSTTNTT 
101 IQIIQEQPES LVKIFGSKAM QTKAKAVIDN FVKKLEENYN SECGIDTAFO 
151 PSVGKDGSTD NNVVAGDRPL I DWDQIREEG LKWQKTKWAD LPPIKKNFYK 
201 ESTATSAMSK VEADSWRKEN FNITWDDLKD GEKRPIPNPT CTFDDAFQCY 
251 PEVMENIKKA GFQKPTPIQS QAWP1VLQGI DLIGVAQTGT GKTLCYLMPG 
301 FIHLVLQPSL KGQRNRPGML VLTPTRELAL QVEGECCKYS YKGLRSVCVY 
351 GGGNRDEQIE ELKKGVDIII ATPGRLNDLQ MSNFVNLKNI TYLVLDEADK 
401 MLDMGFEPQI MKILLDVRPD RQTVMTSATW PHSVHRLAQS YLKEPMIVYV 
451 GTLDLVAVSS VKQNIIVTTE EEKWSHMQTF LQSMSSTDKV IVFVSRKAVA 
501 DHLSSDLILG NISVESLHGD REQRDREKAL ENFKTGKVRI LIATDLASRG 
551 LDVHDVTHVY NFDFPRNIEE YVHRIGRTGR AGRTGVS1TT LTRNDWRVAS 
601 ELINILERAN QSIPEELVSM AERFEAHQRK REMERKMERP QGRPKKFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14h21, frame 3 



TREMBL : CEY54G1 1A_9 gene: "Y54G11A.3"; Caenorhabditis elegans cosmid 
Y54G11A, N = 1, Score = 1008, P - l.le-101 

TREMBL :SPBP8B7_16 gene: "dbp2"; "SPBP8B7 . 16c"; product: "p68-like 
protein."; S.pombe chromosome II pi p8B7., N = 1, Score » 97 1 ( p - 
9. le-98 

PIR:S13757 RNA helicase DBP2 - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 970, P » 1.2e-97 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N - 1, Score = 961, P = le-96 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score = 888, P - 7.8e-91 



>TREMBL:CEY54G11A_9 gene: "Y54G1 1A . 3"; Caenorhabditis elegans cosmid 
Y54G11A 

Length » 504 



HSPs: 



Score - 1008 (151.2 bits). Expect = l.le-101, P - l.le-101 
Identities - 211/473 (44%), Positives - 298/473 (63%) 



Query: 174 DQIREEGLKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEK 233 

D++++E W K PI ++ YK +S + + ++ 

Sbjct: 23 DRLKDENFSWMK PIVRDLYKIPNEQKNLSPEQLQELYTNGGVMKVYPFREEST 75 

Query: 234 RPIPNPTCTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKT 293 

IP P +F+ AF +M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 

Sbjct: 76 VKIPPPVNSFEQAFGSNASIMGEIRKNGFEKPSPIQSQMWPLLLSGQDCIGVSQTGSGKT 135 

Query: 294 LCYLMPG FIHLVLQPSL KGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVC 348 

L +L+P +H + Q + + Q+ P +LVL+PTRELA Q+EGE KYSY G +SVC 

Sbjct: 136 LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 

Query: 34 9 VYGGGNRDEQIEELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 
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+YGGG+R EQ+E + GV+I+IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYVVLDEADRMLDMGFEV 2 55 

Query: 409 QIMKILLDVRPDRQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNI IVT 4 68 

I +IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: '256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 469 TEEEKW SHMQTFLQSMSSTD-KVIVFVSRKAVADHLSSDLILGNISVESLHGDREQR 524 

+ ++ + FL + + K+I+FV K +ADHLSSD + 1+ + LHG R Q 

SbjCt: 316 PHDSRFLRVCEIVNFLTAAHGQNYKMI IFVKSKVMADHLSSDFCMKGINSQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRILIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRT 584 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
SbjCt: 376 DREMSLNMLRSGEVQI LVATDLASRGI DVPDITHVLN YDFPMDI EEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 644 

G +++ L ND LI ILE++ Q +P++L AE++ K + R RP R 

SbjCt: 436 GEAMSFLWWNDRSNFEGLIQILEKSEQEVPDQLRRDAEKYRL KCQSGRDGPRPSFRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 



Pedant information for DKFZphtes3_14h21, frame 3 



Report for DKF2phtes3_14h21 . 3 



[LENGTH] 648' 

[MWJ 72873.51 

[pi] 8.84 

[ HOMOL ] TREMBL:CEY54G11A_9 gene: "Y54G11A. 3" ; Caenorhabditis elegans cosmid Y54G11A le- 
101 

{ FUNCAT ] 04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-97 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YNL112w] 2e-97 

[FUNCAT] 04.05.03 mrna processing (splicing) (S. cerevisiae, YPL119C] 4e-72 

[FUNCAT] 30.03 organization of cytoplasm (S. cerevisiae, YOR204w) 2e-70 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 
YOR204W] 2e-70 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YBR237w] le-61 

[ FUNCAT ] 1 genome replication, transcription, recombination and repair [H. 
influenzae, HI0892) 2e-49 

[FUNCAT] j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA) le-48 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YDL160c) 9e-45 

( FUNCAT ] 04.05.01.07 chromatin modification [S. cerevisiae, YMR290c) 3e-44 

( FUNCAT 1 09.01 biogenesis of cell wall [S. cerevisiae, YJL033wj 2e-36 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YOR046cJ 7e-32 

(FUNCAT] 30.16 mitochondrial organization (S. cerevisiae, YDR194c] 2e-28 

(FUNCAT] 99 unclassified proteins [S. cerevisiae, YGL064c] 5e-10 

[FUNCAT] 11.10 cell death [S. cerevisiae, YMR190c] 2e-08 

(FUNCAT] 03.19 recombination and dna repair (S. cerevisiae, YMR190c] 2e-08 

[FUNCAT] r general function prediction [M. jannaschii, MJ1401] le-07 

[BLOCKS] BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

[PIRKW] nucleus 4e-96 

[PIRKW] RNA binding 3e-87 

[ PIRKW] DEAD box 5e-50 

[PIRKW] transmembrane protein 4e-27 

{ PIRKW] DNA binding 3e-67 

[PIRKW] recF recombination pathway 3e-10 

{PIRKW] ATP 4e-96 

[PIRKW] purine nucleotide binding 5e-50 

[PIRKW] P-loop 4e-96 

[PIRKW] hydrolase 9e-45 

[PIRKW] protein biosynthesis 5e-50 

[PIRKW] ATP binding le-61 

[SUPFAM] WW repeat homology 8e-88 

[SUPFAM] DEAD/H box helicase homology 4e-96 

[SUPFAM] unassigned DEAD/H box helicases 7e-87 

[SUPFAM] ATP-dependent RNA helicase DBP1 4e-96 

[SUPFAM] ATP-dependent RNA helicase DHH1 2e-43 

[SUPFAM] recQ protein 3e-10 

(SUPFAM) Bloom's syndrome helicase 5e-07 

(SUPFAM) translation initiation factor eIF-4A 5e-50 

(SUPFAM) recQ helicase homology 3e-10 

(SUPFAM) tobacco ATP-dependent RNA helicase DB10 8e-68 

(PROSITE] DEAD_ATP HELICASE 1 
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(PROSITE] ATP_GTP_A 1 

(PFAM| Helicases conserved Oterminal domain 

(PFAMI KH domain family of RNA binding proteins 

[ PFAMJ DEAD and OEAH box helicases 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 8.49 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQI IQEQPESLVKI FGSKAM 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVIDNFVKKLEENYNSECGIDTAFQPSVGKDGSTDNNVVAGDRPLIDWDQIREEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPIPNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCYLMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMI VYVGTLDLVAVSSVKQNI I VTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVI VFVSRKAVADHLSSDLI LGNISVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

SEQ LIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRTGVSITTLTRNDWRVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcccccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 



prosite for DKFZphtes3_14h21.3 

PS00017 286->294 ATP_GTP A PDOC00017 
PS00039 394->403 DEAD_AT?_HELICASE PDOC00039 



Pfam for DKFZphtes3_14h21 . 3 
HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 248 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM UPMLQHIDwdPWpqpPQd. . PrALILAPTRELAMQIQEEcRkFgkHMng 

L+P ++H+ +P +++ Q+ P +L+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-SLKGQRNRPGMLVLTPTRELALQVEGECCKYSYK-G- 343 

HMM IRIracIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleM 
+R++C+YGG N ++Q+++L++G+ +I+IATPGRL D+ +++ ++L++I++ 
Query 34 4 LRSVCVYGGGNRDEQIEELKKGV-DIIIATPGRLNDLQMSNFVNLKNITY 392 - 

HMM LVMDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARr 

LV+DEAD+MLDMGF++QI++I+ ++ ++RQT+M SAT + P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR — PDRQTVMTSATWPHSVHRLAQS 440 
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HMM 
Query 



FMRNPIRInld . MdElTtnEnlkQwYiyVerEMWKfdcLcrLle* 
++++ p + ++ D +++ +KQ +1+ E++K + ++++ 
441 Y LK E PM I VYVGTL DLVAV S- S V KQN I I VTT - EEEKWS HMQT FLQ 



HMM_NAME KH domain family of RNA binding proteins 

HMM *rIiIPedhMGMIIGKGGsNIRqIREEYgvrINIPdecCeD3tdRI IT It 

+ + ++++G++IG+GGS I++I++ ++++I I++E+ + + + I 
Query 71 ' CFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQ-P ESLVKIF 115 

HMM G* 
G 

Query 116 G 116 



HMM_NAME Helicases conserved C-terrainal domain 

HMM *EileeWLknl GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTD 

+ +++ L+ + +I+V ++HGD++Q++R++++++F++G+ ++LI+TD 
Query 497 KAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRILIATD 545 

HMM VggRGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

+++RG+D+ DV HV+N+D+P+N+E Y++RIGRTGR+G 
Query 54 6 LASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAG 582 
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DKFZphtes3_14pl4 



group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3969 bp 

Poly A stretch at pos. 3948, polyadenylation signal at pos. 3927 

1 GAAGCCCAGG CTCTCCTTAG TTGACTGTGT GTTAATCACC CAGCAATTTC 
51 ATTACTCAAC AGCTCTCCAG AGTTGCACAT TACAGCTGGG GTAGAAATTG 

101 GGTGCTGAAG GCCAGGCAGA GCATTTGGCT GTAGGGAGGC CGATCCTCCT 

151 CGGGCCTGTT ACCGGCGGGT CTTTGTTCTT AGACCTGGGG TTCTTGGCCT 

201 CACGGATTCC AAGGAATGGA ACGTTGGGCC ATGCGTGTGA ACGAGCTCTA 

251 TGTCGATGAC CCAGACAAGG ACAGCGGTGG CAAGATCGAC GTCAGTCTGA 

301 ACATCAGTTT ACCCAATCTG CACTGCGAGT TGGTTGGGCT TGACATTCAG 

351 GATGAGATGG GCAGGCACGA AGTGGGCCAC ATCGACAACT CCATGAAGAT 

401 CCCGCTGAAC AATGGGGCAG GCTGCCGCTT CGAGGGGCAG TTCAGCATCA 

4 51 ACAAGGTATG GAAGCCCTGC CTCAGCCCTT TCTACCTGCT CCCCTTTCCT 

501 GCTGTCTCCC CGCTCCCTGG AAACTGGTTG TGGAGGCACT CACTCGACCT 

551 GACCCTGACA CAGCCCCCAG CAAGCGAGGG TTCGTGTCCA GCTGCCTGGC 

601 CGTTCCTGCT GAGAATCTGG ATGGGGGTCC AGGCTCCCTG GGGTTTTAAG 

651 CCCCTGATGG CTGGTTCAGG AAGGAGCTAC TCTTCTCTCC AGTGAGGGGG 

701 ACAATGATGA GAAGACCTGA GGATTTGCAG CCCCCAGCCC TGGGTTCAAG 

751 TCCCAGCTCT ACCCCTTCTT GGCCCCTACA AGTCACTTGA CCCATCTTAG 

801 GCTGAGGGTG TGATGGCGAT AATAGTATCA CGATACCACC CACTTCACAA 

851 AGTTTGTGTG GGGATTAAAT GAGCTAATGC AGATTCATTC ATTCAGAAAA 

901 ATTTTTGAAT GGCACGTTCT GTGTTCCAGG GTCGGTGATA GGCTCTGGGG 

951 CAGCGTTCCT GGGCTGGTGG GGCTCCCATT CTGGTAGAGG GAGACAGTCT 
1001 ACAAACCAGA AAGCATCAGG GATGCTAAGT GCAGTGATGA GGAATAAAGC 
1051 CAAGGGGAGT GAGATGAGGT GGGCTTGAAA GTACCTTGTC CGCTCAGAAG 
1101 GACCATTCAA GGTTCACTGT TGTTTTGTCC TCAGAACCAG GAGCTTCAGA 
1151 TCCTAAGTCA AGTGGGTGAA CGCAGTGCCC TTGGGAGGGC CGAGGCACCC 
1201 GGTGGCAGCT GGCAGGGTTT TGCTCAGCAC GTGCCGGCCT TCCTCGAAGC 
1251 TCGGTACTGT CACAGTGGAG CCTCTCAACA ACGCTGTGAG GCAGCACCAT 
1301 TTGACAGGTT AGGATGCTGG GGCCCAGAGA GGTTAAGTGT CTTGCCCGAG 
13S1 GTCACACAGC TATCTGCATG TCCCACAACT CCCCTTCCCA GCCCCAGCCA 
1401 AACTGAGCCA CTGGCCACTC CTGGCVTCTC CTTGTCCCTC CTGCAGCCTC 
1451 TGCTCAGAAC GCCCTTCCTC CAGACCCTGA CACCTGAGCT GGGGTTGCAA 
1501 AGTCACTGGC CACATCCAGC CCAAAGATAA ATTTTGTTTG TCCAGTATAG 
1551 CATTTAACTG CATCAGAACC AGTATGAAAA GACCAGGAAT CCAGATTTCT 
1601 GGCTTTTAAA AGTCAGAGGC TCTCACTACA CTGGGTCCGT GTTCCCGCTA 
1651 TGACAATGAC CTGGCACCAA TGGGCAGTGT TCCCCTTTAG AGAGGGTGTG 
1701 TGCTGTCCCT TCCCACAGTC CCTGGCAGGC GGCTGGAAGG CCAGGCCTGG 
1751 TCATCTGTCA AGCAGGGTGG ACTTCTTACG TGACAGTTCA GGGCTCCCTT 
1801 AAGTGCTAAA GCAGAAGCTG CAAGGCTTTC TTAAGGTTTC GAGTGTTGCT 
1851 GGGAGAAATC TGCTGCATGT TGTGGGTTAA AGGGAGTCTC TCACCAGCCC 
1901 AGGCCCTCAG GAGGAGGAGA TACCAGGAGG CAGGGATGCT GGGGGTCGTG 
1951 GTTCACTGGG GGCTCTCTCT GCCCATGAGC TGCCACACAG CACCTTTGCC 
2001 ATGCCCCGTA ATTTGGATTT TATGGTGGTT GTGATGGAAA GCCATTTGAG 
2051 GGTTTTGAAC AGGGAGGCAA TGTAATCAGA TTTATGCCTT AGAACTGGAC 
2101 TATCCAATAG GTTGCCACCA GCCACATAAG GCTATTTAAA TTAATTCAAA 
2151 TTAAATGTAC AATTCAGTCA CTCATTCTCA TCAACCACAT TTCAAGTGCT 
2201 CAAAGCCACG TGCTGGCTAG GGGCCACAGC GTTAGACAGT GCAGAGAGAA 
2251 AGCACTTCCA TCGCTGAGGA AAGTTCTGCT GGACCGCACA CCCTTAGAAG 
2301 GATGGCTCTG GTGGCCGGGC GCGGTGGCTC AAACCTGTAA TCCCAGCACT 
2351 TTGGGAGGCC GAGGTGGGTG GATCACGAGG TCAGGAGATC GAGACCATCC 
2401 CGGCTAACAT GGTGAAACCC TGCCTCTACT AAAAATACAA AAAAAAACAA 
2451 AATTAGCCGG GCGTGGTTGC GGGCACCTGT AGTCCCAGCT ACTCAGGAGG 
2501 CTGAGGCGGG AGAATGGCAT GAACCCGGGA GGTGGAGCTT GCAGTGAGCC 
2551 AAGATCGTAC CACTGCACTC CAGTCTGGGC GACAGAGTGA GACTCCATCT 
2601 CAAAACAAAC AAAAAAAGGA TGGGGCTGGG CTGGAGAGGG TGGCAGGCAG 
2651 TGGTTGTGGC AGTGGAGCTG GGGAGATGTG GTCGGATTAG GGAGGTAGAA 
2701 TCAATAAGAC TCAGTGAAGA ATCGGATGTG GGGGTAAGGG CACATGTGGA 
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2751 AGCAAAGAAA CCTTTGACGT CTTTGTCTTG ACAACCGGGT GGTCCTGTTT 
2B01 CTAGACATGG AAGCTTAGAA AAGCCTGGAG TCTGTGGGAA GTAGGTAGGG 
2851 CTGGGCACTG GTCATTCCAC TCTGGTTTCC TTTGGGGTTC CCATTAGGTG 
2901 TCTACAGGGA GAGGTGAAAT TGGAAGTTGG AGGTGTGGAG AGTTCAGGAG 
2951 AGGGTTCTGG ACCACAGATG TTGAGGTGGG AGTCATTAGT GAATAGATGA 
3001 TGTTGGAAGT CATGGGTCCT CAGAGTGGGG GCTCCTTAAG CCTCCAGGCC 
3051 AGCAGCATCA GCATCACCTG GGAGATTGTT AGGAATGCAG ATTCTCAGGC 
3101 CCCCCTAAGA CCCACCGACT CTGTGCTAGA ACAAGCGCCC CTCAGAGATT 
3151 CTGATGCCAC TGAAGTTTGA GGAGCATTGG TTTAAGCAAG ATTACCTACG 
3201 GAGAGGCTGT AGATCCGTGT TCTAAACCTG GGGTCCACAG ACACCCCCAA 
3251 GAAGAGCGGA TTGAATGCAA GAGATCTATG AAGTTGGATG GGGGAAAAAT 
3301 TGACATCTTT ATTTTTGCTA AACTCGATCT AAAGTTTAGC ATTTCCATCT 
3351 GCGATGAATG TAGGCCACAA ACCACAGTAG TATTAGCAGT GCCTGGGACC 
3401 TCCTCAACAA CAGAAATTGC CGGTATTTAT AGCACGT.TAC AGTTGTTGCA 
3451 GATAATTTCC AGAGACTGTT TATATGCACC ACTGTTTTAA AATTACGGTG 
3501 ATTGGCCAGG TGCAGTGGCT CACACCTGTA ATCCCAGCAC TTTGGGAGGC 
3551 CAAAGTGGGT GGATCACTTG AGGAGTTCAA GACCAGCCTG GTCAACATGT 
3601 CAAAACCCTG TATCTACAAA AAAATACAAA AGTTAACCAA GCCTATGCTT 
3651 GTAGTCACAG CTACTCGGGA GGCCGAGGTG GGAGGGTCTT CTGAGCCCAG 
3701 GGAGGTAGAG GCTTCAGTGA GCTGAGATCG CACCACCACA CTCCAGCCTG 
3751 GGTGACAGAG TGAAACCCTT AATCAATCAG TCAATAAAAA TTACAGTAAT 
3801 TATTAGACCC ACCACTAGGT CATCTTATTT GATGCATCAG TAAAGCAGCA 
3851 TATTCAAATG TGGATTTTTA AATATTTTAA TTACTATTTA AATATCTCTT 
3901 TACTTTGTAA TCCTATGCAT TTTACGCATT AAAACATTTT AAGCATTTAA 
3951 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 



1 MERWAMRVNE LYVDDPDKDS GGKIDVSLNI SLPNLHCELV GLDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRHSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14pl4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_14pl4 , frame 3 



Report for DKFZphtes3_14pl4 . 3 



[LENGTH] 159 

[MW] 17778.55 

(pi] 5.74 

( FUNCAT] 99 unclassified proteins [S. cerevisiae, YAL042w] 5e-04 

{KWJ Alpha_Beta 



SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVGLDIQDEMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KIPLNNGAGCRFEGQFSINKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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PRD ccccccchhhhhhhhhhhccccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_14pl4.3) 
(No Pfam data available for DKFZphtes3_14pl4 .3) 



572 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_14p7 



group: testes derived 

DKFZphtes3_14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated protein KAP3 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testia-specif ic 
genes . 

weak similarity to kinesin associated protein KAP3 

complete CDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2497 bp 

Poly A stretch at pos . 2424, polyadenylation signal at pos. 2400 



1 GGAATCCAAA GAAACAGTTA TGATGGGGGA CTCTATGGTG AAAATAAATG 
51 GGATTTATTT AACAAAATCA AATGCTATTT GCCACTTAAA GAGTCACCCA 
101 CTTCAGCTAA CTGATGATGG AGGCTTCAGT GAAATAAAGG AGCAAGAAAT 
151 GTTCAAAGGA ACAACATCTT TACCATCTCA TCTCAAGAAT GGAGGGGACC 
201 AGGGGAAGAG ACATGCGAGG GCCTCATCAT GCCCCAGTAG CTCAGACCTG 
251 AGCAGGCTGC AAACCAAAGC AGTCCCAAAA GCTGACCTGC AAGAAGAGGA 
301 CGCAGAAATA GAAGTAGACG AAGTCTTTTG GAATACAAGG ATTGTACCGA 
351 TTTTGCGTGA ATTAGAAAAG GAAGAAAACA TTGAAACGGT TTGTGCTGCT 
401 TGCACACAAC TTCATCATGC TTTAGAGGAA GGAAACATGC TTGGAAATAA 
451 ATTTAAGGGA AGAAGTATTC TCCTGAAGAC CCTGTGTAAA CTAGTTGATG 
501 TTGGTTCAGA CTCGCTCAGC CTTAAACTTG CAAAAATAAT TCTAGCACTT 
551 AAAGTGAGTA GAAAGAATCT TCTTAATGTC TGCAAACTTA TATTTAAAAT 
601 TAGCAGGAAT GAGAAGAATG ATTCTTTGAT TCAAAATGAC AGCATTCTGG 
651 AATCATTATT GGAGGTACTA AGAAGTGAAG ACCTGCAAAC TAACATGGAA 
701 GCTTTTTTAT ACTGTATGGG GTCTATAAAG TTCATTTCTG GAAATCTGGG 
751 ATTTCTTAAT GAAATGATCA GCAAAGGTGC TGTGGAAATA CTGATAAATT 
801 TGATAAAACA AATAAATGAG AACATCAAGA AATGTGGTAC ATTTTTGCCT 
851 AATTCGGGCC ACTTGCTAGT CCAGGTGACT GCTACATTGA GAAACTTGGT 
901 TGATTCATCA TTAGTAAGAA GTAAGTTCCT AAACATCAGT GCCCTTCCCC 
951 AGCTCTGCAC GGCAATGGAA CAGTACAAGG GTGACAAGGA CGTCTGTACC 
1001 AATATTGCCA GAATATTCAG CAAACTTACT TCTTACCGTG ACTGCTGCAC 
1051 AGCCTTGGCC AGCTATTCCA GATGTTATGC CTTATTTCTG AATCTAATTA 
1101 ACAAATACCA GAAGAAGCAG GATTTAGTCG TCCGTGTTGT TTTTATTCTT 
1151 GGCAACCTGA CGGCAAAAAA TAACCAGGCT CGTGAACAAT TTTCCAAAGA 
1201 GAAAGGGAGC ATCCAAACTC TGCTGTCATT ATTCCAGACG TTCCATCAGC 
12S1 TGGATCTGCA TTCCCAGAAG CCGGTGGGCC AACGAGGCGA GCAGCACAGG 
1301 GCGCAGAGGC CGCCGTCAGA GGCAGAGGAC GTGCTCATCA AGCTGACTCG 
1351 TGTGCTGGCC AACATTGCCA TCCACCCGGG CGTGGGCCCG GTGCTGGCCG 
1401 CCAACCCGGG GATAGTGGGC CTGCTCCTGA CCACGCTGGA ATACAAGTCA 
1451 CTTGATGATT GTGAGGAGCT GGTGATCAAT GCTACAGCGA CAATCAACAA 
1501 TTTATCTTAC TACCAAGTGA AGAATTCCAT AATTCAAGAC AAAAAGCTAT 
1551 ATATTGCTGA ATTGCTCTTA AAGCTTCTTG TCAGTAACAA CATGGATGGA 
1601 ATCCTGGAGG CTGTGCGTGT TTTCGGAAAT CTCTCCCAGG ACCATGATGT 
1651 CTGCGATTTC ATTGTGCAGA ACAATGTCCA CAGGTTCATG ATGGCGCTGC 
1701 TGGATGCTCA GCATCAGGAT ATCTGCTTTT CTGCCTGTGG TGTTCTCCTC 
1751 AATCTCACTG TGGATAAAGA CAAGCGTGTC ATCTTGAAAG AAGGAGGTGG 
1801 CATTAAAAAG TTAGTGGACT GTTTAAGAGA TTTGGGTCCT ACTGATTGGC 
1851 AGCTGGCCTG CTTGGTTTGT AAAACTTTAT GGAACTTCAG TGAAAACATC 
1901 ACTAATGCTT CGTCATGTTT TGGAAATGAA GACACCAACA CACTCTTACT 
1951 CTTGCTCTCA TCATTTTTAG ATGAAGAACT AGCACTGGAT GGCAGTTTTG 
2001 ATCCAGACCT AAAAAACTAT CACAAACTCC ATTGGGAAAC AGAATTCAAA 
2051 CCTGTGGCAC AGCAGCTTCT AAACCGAATT CAGAGACATC ACACCTTCCT 
2101 GGAACCCCTG CCCATTCCCT CTTTCTAACA TGATGCAGAT TAACAGTAGA 
2151 AACGAGAACT CACGTCTCCC TCATTCTTAA GAACTGGTAA CAAACGTGAA 
2201 CATTTTTTTC AGCATTAACA AATGTGGAAA GTTTTTCAAG AACTGGTTTT 
2251 AGTGAGTAGC TGAAGTATTT TTTAAAATTA AGCATTTCTT CTTGTTAGGT 
2301 ATTATGGAAA AATGAATATA CACATTATAT TTCCTGTTGA GAGAAATGTA 
2351 AGATGAAAAT ATGTGCATTT TCAAGTAAAT GACTTTTTCT TCTATTCTCT 
2401 ATTAAACAAT TTAGTTCTAG TCTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
2451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 
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NO BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 



1 MMGDSMVKIN GIYLTKSNAI CHLKSHPLQL TDDGGFSEIK EQEMFKGTTS 
51 LPSHLKNGGD QGKRHARASS CPSSSDLSRL QTKAVPKADL QEEDAEIEVD 
101 EVFWNTRIVP ILRELEKEEN IETVCAACTQ LHHALEEGNM LGNKFKGRSI 
151 LLKTLCKLVD VGSDSLSLKL AKIILALKVS RKNLLNVCKL IFKISRNEKN 
201 DSLIQNDSIL ESLLEVLRSE DLQTNMEAFL YCMGSIKFIS GNLGFLNEMI 
251 SKGAVEILIN LIKQINENIK KCGTFLPNSG HLLVQVTATL RNLVDSSLVR 
301 SKFLNISALP QLCTAMEQYK GDKDVCTNIA RIFSKLTSYR DCCTALASYS 
351 RCYALFLNLI NKYQKKQDLV VRVVFI LGNL TAKNNQAREQ FSKEKGSIQT 
401 LLSLFQTFHQ LDLHSQKPVG QRGEQHRAQR PPSEAEDVLI KLTRVLANIA 
451 IHPGVGPVLA ANPGIVGLLL TTLEYKSLDD CEELVINATA TINNLSYYQV 
501 KNSIIQDKKL YIAELLLKLL VSNNMDGILE AVRVFGNLSQ DHDVCDFIVQ 
551 NNVHRFMMAL LDAQHQDICF SACGVLLNLT VDKDKRVILK EGGGIKKLVD 
601 CLRDLGPTDW QLAC LVCKTL WNFSENITNA SSCFGNEDTN TLLLLLSSFL 
651 DEELALDGSF DPDLKNYHKL HWETEFKPVA QQLLNRIQRH HTFLEPLPIP 
701 SF 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14p7, frame 2 

TREMBL : MMD3 67_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, 
complete cds., N - 2, Score - 97, P = 0.00039 

>TREMBL:MMD367_1 product: "KAP3B" ; Mus musculus mRNA for KAP3B, complete 
cds. 

Length - 772 

HSPs: 

Score - 97 (14-6 bits), Expect = 3.9e-04, Sum P(2> - 3.9e-04 
Identities - 45/163 (27%), Positives = 77/163 (47%) 

Query: 442 LTRVLANI AIHPGVGPVLAANPGI VGLLLTTLEYKSLDDCEELVINATATINNLSYYQVK 501 

L ++ + NI+ H G P VG L + S D+ EE VI T+ NL+ + 
Sbjct: 483 LMKMIRNI SQHDG — PTKNLFIDYVGDLAAQI SSDEEEEFVIECLGTLANLTIPDLD 537 

Query: 502 -NSIIQDKKLYIAELLLKLLVSNNMDG-ILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMA 559 

+++ + KL + L KL D +LE V + G +S D + ++ + ++ 

Sbjct: 538 WELVLKEYKL-VPFLKDKLKPGAAEDDLVLEWIMIGTVSMDDSCAALLAKSGI I PALI E 596 

Query: 560 LLDAQHQDICFSACGVLL NLTVDKDKR-VILKEGGGIKKLVDCLRD 604 

LL+AQ +D F C++ + + R VI+KE L+D + D 

Sbjct: 597 LLNAQQEDDEF-VCQI IYVFYQMVFHQATRDVIIKETQAPAYLIDLMHD 64 4 

Score • 77 (11.6 bits), Expect » 3.9e-04, Sum P(2) * 3.9e-04 
Identities - 42/178 (23%), Positives » 82/178 (46%) 

Query: 169 K LAK I I LALKVSRKNLLNVCK-L IFKISRNEKN DSLIQNDS I LESLLEVLRSEDLQTNME 227 

K K L V ++ LL V L+ ++ + + + ++N +1+ L++ L + N E 
Sbjct: 263 KTFKKYQGLVVKQEQLLRVALYLLLNLAEDTRTELKMRNKNIVHMLVKALDRD NFE 318 

Query: 228 AFLYCMGSIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVT 287 

+ + +K +S + N+M+ VE L+ +1 +E++ L + + 

Sbjct: 319 LLILVVSFLKKLSIFMENKN DMVEMDIVEKLVKMIPCEHEDL LNITLR 366 

Query: 288 ATLRNLVDSSLVRSKFLNISALPQLCTAM— EQYKGDKDVCT--NIARI--FSKLTSYRD 341 

L D+ L R+K + + LP+L + E YK +C +1+ F + +Y D 
Sbjct: 367 LLLNLSFDTGL-RNKMVQVGLLPKLTALLGNENYK-QIAWCVLYHISMDDRFKSMFAYTD 424 
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Query: 342 CCTAL 346 
C L 

Sbjct: 425 CIPQL 429 

Score - 69 (10.4 bits), Expect - 2.6e+00, Sum P(2| - 9.2e-01 
Identities » 35/146 (23%), Positives - 70/146 (47%) 

Query: 512 IAELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFS 571 

I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+D+ 

Sbjct: 304 IVHMLVKALDRDN FELLILVVSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDLLNI 363 

Query: 572 ACG VLLN LT VDK DKRV I LKEGG6 1 KKL V DC L RDLG PTDW - QLAC LVC KTLWN FS EN I TNA 630 

+LLNL+ D R + + G + KL L G ++ Q+A +C L++ S + 
Sbjct: 364 TLRLLLNLS FDTGLRNKMVQVGLLPKLTALL GNENYKQIA--MC-VLYHISMD-DRF 416 

Query: 631 SSCFGNEDT-NTLLLLLSSFLDEELALD 657 

S F D L+ +L DE + L+ 
Sbjct: 417 KSMFAYTDCIPQLMKMLFECSDERIDLE 4 44 

Score - 68 (10.2 bits), Expect - 3.2e-03, Sum P(2) = 3.2e-03 
Identities =• 18/58 (31%), Positives - 30/58 (51%) 

Query: 190 LIFKISRNEKN-DSLIQNDSILESLLEVLRSE DLQTNMEAFLYCMGSIKFISG 241 

LI +++RN N + L+ N++ L +L VLR + +L TN+ +C S G 

Sbjct: 155 LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNII YI FFCFSSFSHFHG 212 

Score « 65 (9.8 bits), Expect - 6.4e+00, Sum P(2) - 1.0e+00 
Identities = 26/122 (21%), Positives - 53/122 (43%) 

Query: 283 LVQVTATLRNL VDSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTS 338 

+++ TL NL +0 LV ++ +P I ++ + D+ + I S 

Sbjct: 521 VIECLGTLANLTI PDLDWELVLKEY KLVPFLKDKLKPGAAEDDLVLEVV-IMIGTVS 576 

Query: 339 YRDCCTALASYSRCYALFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSI 398 

D C AL + S + L+N Q+ + V +++++ + + R+ KE + 

Sbjct: 577 MDDSCAALLAKSGI I PALIELLNAQQEDDEFVCQI I YVFYQMVF-HQATRDVI I KETQAP 635 

Query: 399 QTLLSL 404 
L+ L 

Sbjct: 636 AYLIDL 641 

Score = 65 (9.8 bits), Expect - 6.4e+00, Sum P(2) » 1.0e+00 
Identities - 44/177 (24%), Positives - 79/177 (44%) 

Query: 481 CE-SLVINATATIN-NLSYYQ-VKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 537 

CE E ++N T + NLS+ ++N ++Q ++ LLL+N IA+V+ 
Sbjct: 355 CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGNENYKQI--AMCVLYH 409 

Query: 538 LSQDHDVCD-FIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGGIK 596 

+SD F +++ML+ +1 tNL +K ++ EG G+K 

Sbjct: 410 ISMDDRFKSMFAYTDCIPQLMKMLFECSDERIDLELISFCINLAANKRKVQLICEGNGLK 469 

Query: 597 KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSC FGNEDTNTLLLLLSSFLDEELAL 656 

L+ R L D L+ K + N S++ + F + L +SS +EE + 

Sbjct: 470 MLMK- -RALKLKD PLLMKMIRNISQHDGPTKNLF-IDYVGDLAAQISSDEEEEFVI 522 

Query: 657 0 657 
+ 

Sbjct: 523 E 523 

Score - 61 (9.2 bits), Expect - 1.6e-02, Sum P(2) - 1.6e-02 
Identities = 20/66 (30%), Positives » 34/66 (51%) 

Query: 304 LNISALPQLCTAM-EQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYALFLNLINK 362 

LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N + I + 

Sbjct: 171 LNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 229 

Query: 363 YQKKQDL 369 
K+ +L 

Sbjct: 230 ELKRHEL 236 

Pedant information for DKFZphtes3_14p7, frame 2 
Report for DKFZphtes3_14p7.2 

[LENGTH] 70S 

[MW] 79266.35 

[pi] 6.57 
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[FUNCAT) 30.25 vacuolar and lysosomal organization [S. cerevisiae, YEL013w] 3e-04 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

3e-04 

[ FUNCAT ) 09.25 vacuolar and lysosomal biogenesis is. cerevisiae, YEL013wJ 3e-04 

[BLOCKS j BL00923F Aspartate and glutamate racemases proteins 

[BLOCKS} BL00288B Tissue inhibitors of metalloproteinases proteins 

(PROSITE) MYRISTYL 9 

[PROSITE] AMIDATION 1 

[PROSITE1 CK2 PHOSPHO SITE 12 

[PROSITE] PKC~>HOSPHO~~SITE 7 

[PROSITE] ASN~GLYCOSYLATI0N 11 

[KW] Alpha Beta 

[KW]. LOW_COMPLEXITY 7.49 % 

SEQ ESKETVHKGDSMVKINGIYLTKSNAICHLK5HPLQLTDDGGFSEIKEQEMFKGTTSLPSH 

SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMLGNKFKGRSILLKTLCKLVDVGSDSLSLKLAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx . 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVTATLRNLV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRWFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQKPVGQRGEQHRAQRPPSEAEDVLIKLTRVLANIAIHPGVGPVLAANPGIVGLLLTTLE 

SEG 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSI IQDKKLYIAELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ I KKLV DCL RD LG PT DWQLAC L VC KT LWN FS EN I TNASSC FGN EDT NTLL L LLS S FLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPIPSF 

SEG xxx 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 



Prosite for DKFZphtes3_14p7.2 
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(No Pfam data available for DKF2phtes3_14p7 .2) 
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DKFZphtes3_15al3 
group: testes derived 

DKF*Zphtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl. ~ 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-specif ic protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos. 1766, no polyadenylation signal found 

1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG TGCAGCCTCG TGCAGCTCTT 
51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA ATGTTGAATT AAAGAAAATA 

101 CTTTATCAGA AGAAGATGGC CACTGCCCAG TTGCAGAGGA CTCCCATGAG 

151 TGCACTGGTA TTTCCCAATA AGATATCAAC TGAACACCAG TCTTTGGTGT 

201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT CCTGTATCAC GTATTTGAGG 

251 GGAATATTCC CAGAATGCGC TTATGGAACA AGATATCTAG ATGATCTTTG 

301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG CCCAGGATCT ACACAGTTAG 

351 TGAAATGGAT GCTAGGATGT TATGATGCTT TACAGAAAAA ATATGTATAC 

401 ACAAACCCAG AAGATCCTCA GACAATTTCA GAATGTTACC AATTCAAATT 

4 51 CAAATACACC AATAATGGAC CACTCATGGA CTTCATAAGT AAAAACCAAA 

501 GCAACGAATC TAGCATGTTG TCTACTGACA CCAAGAAAGC AAGCATTCTC 

551 CTCATTCGCA AGATTTATAT CCTAATGCAA AATCTGGGGC CTTTACCTAA 

601 TGATGTTTGT TTGACCATGA AACTTTTTTA CTATGATGAA GTTACACCCC 

651 CAGATTACCA GCCTCCCGGT TTTAAGGATG GTGATTGTGA AGGAGTTATA 

701 TTTGAAGGGG AACCTATGTA TTTAAATGTG GGAGAAGTCT CAACACCTTT 

751 TCACATCTTC AAAGTAAAAG TGACCACTGA GAGAGAACGA ATGGAAAATA 

801 TTGACTCAAC TATACTATCA CCAAAACAAA TAAAAACACC ATTTCAAAAA 

851 ATCCTGAGGG ACAAAGATGT AGAAGATGAA CAGGAGCATT ATACAAGTGA 

901 TGATTTGGAC ATTGAAACTA AAATGGAAGA ACAGGAAAAA AACCCTGCAT 

951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT GTGAGGAAGA TGAAATTATG 
1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT TCTCATTCTC AGGTTGAGCA 
1051 GTTAGTCAAT AAAACATCTG AACTTGATAT GTCTGAAAGC AAAACAAGAA 
1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA ATGGAAATCA ACCAGTAAAA 
1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA CATGAATCTG GGAGAATAGT 
1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA GTCAGTGCCA AAAAGGAGAA 
1251 AGTTTAGTGA ACCAAAGGAA CATATATAAA AATTATTTTT GTTCTGCAGG 
1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG AAGGACCCTA TATTATATTT 
1351 CCCTAACTCT GAAGATGTAT ATGTAGTTTA AAGCAGTTTG TACACTAAAA 
1401 CTAAGTTTTT GGCTGACTGT CATATTGTGG TCCTTAATCT TGAGATAAAT 
14 51 CCAATAGAAC TTTTGAATAA AAGCAAAAGT ACAAATGTCA TAATTGATTC 
1501 GGTAATAAGT AAAATTTCAA AATTGATTTT GTTCATTACC TACTTAATAT 
1551 TTCCTTTAAA TATATACTAA CTGTTAAGGC CCTCTAATGC CATTTTTCTA 
1601 AACAGTAATG TTTACTTTGG TATTAAAATT TGGTATGGAT TCACTTTTTA 
1651 CTTATGTTAA AATTATACCA TTTAACTGGC TCTTTTGTCA TTGTGCTGTT 
1701 ATTAAAACAA TGTTCTTCAA TATTTTGACA TAATGTATTA ACATTTTAAT 
1751 ATATAATGTA CAATTTAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 
1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACAAAAAAA AAAAAAGG 

BLAST Results 

NO BLAST result 

Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 



1 MATAQLQRTP MSALVFPNKI STEHQSLVLV KRLLAVSVSC ITYLRGIFPE 
51 CAYGTRYLDD LCVKILREDK NCPGSTQLVK WMLGCYDALQ KKYVYTNPED 
101 PQTISECYQF KFKYTNNGPL MDFISKNQSN ESSMLSTDTK KASILLIRKI 
151 YILMQNLGPL PNDVCLTMKL FYYDEVTPPD YQPPGFKDGD CEGVIFEGEP 
201 MYLNVGEVST PFHIFKVKVT TERERMENID STILSPKQIK TPFQKILRDK 
251 DVEDEQEHYT SDDLDIETKM EEQEKNPASS ELEEPSLVCE EDEIMRSKES 
301 PDLSISHSQV EQLVNKTSEL DMSESKTRSG KVFQNKMANG NQPVKSSKEN 
351 RKRSQHESGR IVLHHFDSSS QESVPKRRKF SEPKEHI 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15al3, frame 2 

TREMBL:ATAC2130_3 product: "F1N21.3"; The sequence of BAC F1K21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N » 1, Score - 
274, P = 5.7e-22 

TREMBL:SC9877_9 gene: "hopl"; S. cerevisiae chromosome IX cosmid 9877., 
N - 2, Score = 126, P = 7 . le-09 

PIR:A34691 meiosis-specif ic protein HOP1 - yeast ( Saccharomyces 
cerevisiae), N = 2, Score - 126, P = 7.8e-08 



>TREMBL:ATAC2130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 
Length - 562 



Score - 274 (41.1 bits), Expect - 5.7e-22, P = 5.7e-22 
Identities - 8.4/290 (28%), Positives = 145/290 (50%) 



Query: 


22 


Sbjct: 


11 


Query: 


82 


Sbjct: 


71 


Query: 


131 


Sbjct: 


130 


Query: 


185 


Sbjct: 


190 


Query: 


236 


Sbjct: 


249 



TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + 



M G YDALQ+KY+ T 



-NPEDPQTISECYQFKFKYTNNGP— LMDFISK— NQSN 130 
D I E Y F F Y+++ +M I++ N+ N 



+ ++ ++R + LM+ L +P++ + MKL YYD+VTPPDY+PP 



p+ + +G V++ 



-ERERMENIDSTILS 235 
E + M++ D + 



QE+ 



-TSDDLDIETKMEEQEKN PASSE 281 
DD D E ++ ++PA +E 



Pedant information for DKFZphtes3_15al3, frame 2 



Report for DKFZphtes3_15al3 .2 



[LENGTH] 387 

[MW] 44417.64 

[pi] 5.57 

[HOMOL] TREMBL : ATAC2130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 9e-23 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YIL072w) 7e-ll 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YIL072w] 7e-ll 

[PIRKW] nucleus 2e-09 

[PIRKWJ zinc finger 2e-09 
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[PROSITE] 
[PROSITE) 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



[PIRKW] 



DNA binding 2e-09 
MYRISTYL 1 
CAMP PHOSPHO SITE 
CK2_ PHOSPHO SITE 
PKC PHOSPHO~SITE 
AS N~GL If COS Y LAT I ON 
Alpha_Beta 



SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKI YILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVIFEGEPMYLNVGEVSTPFHI FKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDICTKMEEQBKNPASSELEEPSLVCEEDEIMRSKES 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ PDLSISHSQVEQLVNKTSELDMSESKTRSGKVFQNKMANGNQPVKSSKENRKRSQHESGR 

PRD ccccccchhhhhhhhhhcccccccccccccceceeeccccccccchhhhhhhhhhcccce 

SEQ IVLHHFDSSSQESVPKRRKFSEPKEHI 

PRD eeeeecccccccccccccccccccccc 



Prosite for DKFZphtes3_15al3 .2 



PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS0000S 
PS00005 
PS00005 
PS00005 
PS00OO5 

psooooe 

PS00006 
PS00006 

psooooe 

PS00006 

psooooe 

PS00006 

psooooe 
psooooe 

PS00006 
PS00O06 

psooooe 
psooooe 



127->131 
130->134 
315->319 
140->144 
351->355 
378->382 
139->142 
167->170 
221->224 
235->238 
329->332 
346->349 
358->361 
96->100 
103->107 
177->181 
221->225 
260->264 
268->272 
280->284 
308->3I2 
318->322 
346->350 
3S4->358 
369->373 



84->90 



asn_glycosylation 
asnglycosylation 
asn_glycosylation 
camp_phospho_site 
c amp_phos pho_s i te 
camp_phospho_site 
p kc_ phos pho_s i t e 
pkc_ phospho site 
pkc phospho~site 
pkc~phospho site 
pkc phosphors ite 
pkc~phospho_site 
pkc phospho_site 
ck2~phospho site 
ck2 phosphors ite 
ck2~phospho_site 
ck2~phospho_site 
ck22phospho_site 
ck2 phospho_site 
ck2~phospho_site 
ck2~phospho_site 
ck2_phospho_site 
ck2 phospho_site 
ck2~phospho "site 
ck2_phospho~site 
myristyl 



PDOC00001 
PDOC00001 
PDOC000O1 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC000O6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 

poocooooe 

PDOC00006 
PDOC00006 
PDOC00008 



(No Pfam data available for DKFZphtes3_15al3 .2) 
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PCT/IB00/01496 



group: metabolism 

DKFZphtes3_15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), catalyzes the 
oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 



strong similarity to C.elegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus: unknown 

Insert length: 19S6 bp 

Poly A stretch at pos. 1929, polyadenylation signal at pos. 1903 



1 CGAAGGCGGC GGCGAAGGCC CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
51 AGCCATGGCG GAGTCTGTGG AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
101 AGCGGGAACT TGCCCAGGAG AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
151 GGAGGGGGCG GCCGGGTCCG CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
201 TTCGAATCCC TACAGCCGCT TGATGGCATT GAAACGAATG GGAATTGTAA 
251 GCGACTATGA GAAAATCCGT ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
301 GGAGTAGGTA GTGTGACTGC TGAAATGCTG ACAAGATGTG GCATTGGTAA 
351 GTTGCTACTC TTTGATTATG ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
401 TTTTCTTCCA ACCTCATCAA GCAGGATTAA GTAAAGTTCA AGCAGCAGAA 
451 CATACTCTGA GGAACATTAA TCCTGATGTT CTTTTTGAAG TACACAACTA 
501 TAATATAACC ACAGTGGAAA ACTTTCAACA TTTCATGGAT AGAATAAGTA 
551 ATGGTGGGTT AGAAGAAGGA AAACCTGTTG ATCTAGTTCT TAGCTGTGTG 
601 GACAATTTTG AAGCTCGAAT GACAATAAAT ACAGCTTGTA ATGAACTTGG 
651 ACAAACATGG ATGGAATCTG GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
701 TACAGCTTAT AATTCCTGGA GAATCTGCTT GTTTTGCGTG TGCTCCACCA 
751 CTTGTAGTTG CTGCAAATAT TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
801 TTGTGCAGCC AGTCTTCCTA CCACTATGGG TGTGGTTGCT GGGATCTTAG 
851 TACAAAACGT GTTAAAGTTT CTGTTAAATT TTGGTACTGT TAGTTTTTAC 
901 CTTGGATACA ATGCAATGCA GGATTTTTTT CCTACTATGT CCATGAAGCC 
951 AAATCCTCAG TGTGATGACA GAAATTGCAG GAAGCAGCAG GAGGAATATA 
1001 AGAAAAAGGT AGCAGCACTG CCTAAACAAG AGGTTATACA AGAAGAGGAA 
1051 GAGATAATCC ATGAAGATAA TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
1101 TTCAGAAGAG GAACTGAAAA ATTTTTCAGG TCCAGTTCCA GACTTACCTG 
1151 AAGGAATTAC AGTGGCATAC ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
1201 ACTGAGTTAA CAGTGGAAGA TTCTGGTGAA AGCTTGGAAG ACCTCATGGC 
1251 CAAAATGAAG AATATGTAGA TAATGGACTG GGATATATTG TATTTCTCAT 
1301 GTTAAAGCCT CTTCCCTTGA AATTAAAAAA AAATTTTAAC TGATAAAACT 
1351 TAGGGCAACA TTAATTAATG TATATTCTTA CCTGAATTGT TATACTTTTT 
1401 GAAAATCCTG TGACTTGCCT GTTTCTCCCC GCTCCAACGA AATCATTAAC 
14 51 TCTCCTAAAA TGTGTTTCAT TCTAGTAAGA AAACCTCAAA GGATATTGTA 
1501 GGATATAAAT CTTACTTGAA AACATAGCTG TTGAAATGTT TTGGCCTTTT 
1551 GGAGTGGGGG AAGGACAAAT CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
1601 TCCCTTGTGT CTGTTGCATG AGGACATGGA CAATAAAGTA GTATATGATC 
1651 CTCAGATACA GGGAGAAGGA CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
1701 CAAGCATCTG CTCATTATGT TTGGAATTGC TTTCTATAAG AAAATTGCCC 
1751 ACT ACT ACTA ACTTGATCAA CAATGAATTC AAAATAGTTA ACCTATGAAA 
1801 TAACATCCTC TCAAATGTTT GCTGATGAAG TACAAGTTGA AATGTAGTTA 
1851 TTGGAAAAGT CTGTAACCTG TGGATCATAT ATATTCAAAG TGAGACAAAG 
1901 GCAAATAAAA AGCAGCTATT TTCATGAATA GACAAAAAAA AAAAAAAAAA 
1951 AAAAAG 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein 
Classification: Metabolism 

Prosite motifs: D_2_HYDROXYACID_DH_l (76-105) 



1 MAESVERLQQ RVQELERELA QERSLQVPRS GDGGGGRVRI EKMSSEVVDS 
51 NPYSRLMALK RMGI VSDYEK IRTFAVAIVG VGGVGSVTAE MLTRCGIGKL 
101 LLFDYDKVEL ANMNRLFFQP HQAGLSKVQA AEHTLRNINP DVLFEVHNYN 
151 ITTVENFQHF MDRISNGGLE EGKPVDLVLS CVDNFEARMT INTACNELGQ 
201 TWMESGVSEN AVSGHIQLII PGESACFACA PPLVVAANID EKTLKREGVC 
251 AASLPTTMGV VAGILVQNVL KFLLNFGTVS FYLGYNAMQD FFPTMSMKPN 
301 PQCDDRNCRK QQEEYKKKVA ALPKQEVIQE EEEIIHEDNE WGIELVSEVS 
351 EEELKN FSGP VPDLPEGITV AYTI PKKQED SVTELTVEDS GESLEDLKAK 
401 MKNM 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c24, frame 1 

TREMBL :CEUT03F1 11 gene: "T03F1.1"; Caenorhabditis elegans cosmid 
T03F1., N - 1, Score - 1204, P - 1.9e-122 

TREMBL : ATAC98 3 gene: "YUP8H12. 3"; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N - 1, Score - 733, P - 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) homolog - Archaeoglobus 
fulgidus, N = 1, Score - 218, P - 1.8e-17 

TREMBL: AF0227 9 6_4 gene: "moeB"; product: "MoeB"; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluster, complete 
sequence., N = 1, Score » 220, P - 3.7e-16 



>TREMBL:CEUT03F1 11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1. 
Length - 419 

HSPs: 



Score - 1204 (180.6 bits). Expect = 1.9e-122, P - 1.9e-122 
Identities - 241/367 (65%), Positives - 293/367 (79%) 



Query: 


37 


RVRIEKMSSEVVDSNPYSRLMALKRMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCG 


96 




R +IEK+S+EVVDSNPYSRLMAL+RMGIV++YE+IR VA+VGVGGVGSV AEMLTRCG 




Sbjct: 


48 


RQKIEKLSAEVVDSNPYSRLMALQRMGIVNEYERIREKTVAVVGVGGVGSVVAEMLTRCG 


107 


Query: 


97 


IGKLLLFDYDKVELANMNRLFFQPHQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVEN 


156 




IGKL+LFDYDKVE+ANMNRLF+QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 




Sbjct: 


108 


IGKLILFDYDKVEIANMNRLFYQPNQAGLSKVEAARDTLIHVNPDVQIEVHNFNITTMDN 


167 


Query: 


157 


FQHFMDRISNGGLEEGKPVDLVLSCVDNFEARMTINTACNELGQTWMESGVSENAVSGHI 


216 




F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 




Sbjct: 


168 


FDTFVNRIRKGSLTDGK-IDLVLSCVDNFEARMAVNMACNEENQIWMESGVSENAVSGHI 


226 


Query: 


217 


QL 1 1 PG E S AC FACAP P LVVAAN I DE KTL KREGVC AAS L PTTMGVV AG I LVQNVLKFLLNF 


276 




Q I PG++ACFAC PPLVVA+ I DE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 




Sbjct: 


227 


QYIEPGKTACFACVPPLVVASGIDERTLKRDGVCAASLPTTMAWAGFLVMNTLKYLLNF 


286 


Query: 


277 


GTVSFYLGYNAMQDFFPTMSMKPNPQCDDRNCRKQQEEYKKKVAALPKQ-EV-IQEEEEI 


334 




G VS Y+GYNA+ DFFP S+KPNP CDD +C ++Q+EY++KVA P EV + EEE + 




Sbjct: 


287 


GEVSQYVGYNALSDFFPRDSIKPNPYCDDSHCLQRQKEYEEKVANQPVDLEVEVPEEETV 


346 


Query: 


335 


IHEDNEWGIELVSEVSEEELKNFSGPVPDLPEGITVAYTI PKKQEDSVTELTVEDSGESL 


394 




+HEDNEWGIELV+E SE + S + G+ AY P K+ D+ TEL+ + + 




Sbjct: 


347 


VHEDNEWGIELVNE-SEPSAEQSSSL--NAGTGLKFAYE-PIKR-DAQTELSPAQA— AT 


399 


Query: 


395 


EDLMAKMKN 403 
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D M +K+ 
Sbjct: 400 HDFMKSIKD 408 

Pedant information for DKFZphtes3_15c24 , frame 1 

Report for DKFZphtes3_15c24 .1 

[LENGTH] 404 

[MW] 44863.36 

tpl] 4.79 

[HOMOLJ TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1. le-115 

[ FUNCAT ) h cofactor metabolism [H. influenzae, HI1449] 2e-08 

[FUNCAT) 06.07 protein modification (glycolsylation, acylation, rayristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YDR390c UBA2 - El-likel 

4e-07 

[FUNCAT] 04.05.05 mrna processing (5* -end, 3* -end processing and mrna degradation) [S. 

cerevisiae, YDR390C UBA2 - El-like] 4e-07 

[FUNCAT] 06.13.01 cytoplasmic degradation [S. cerevisiae, YDR390c UBA2 - El-like] 

4e-07 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YDR390c UBA2 - El-like] 4e-07 

[FUNCAT] 11.01 stress response [S. cerevisiae, YKL210w UBA1 - El-like] 2e-06 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YKL210w UBA1 - El-like] 

2e-06 

[BLOCKS] BL01042A Homoserine dehydrogenase proteins 

fPIRKW] thiamine pyrophosphate le-07 

[PIRKWJ molybdenum 5e-07 

[PIRKWJ molybdopterin biosynthesis 5e-07 

(SUPFAM) molybdopterin biosynthesis protein moeB 2e-12 

[PROSITE] D_2_HYDROXYACID_DH_l 1 

( KW ] T RAN SMEMB RAN E 1 

[KW] LOW_COMPLEXITY 8 . 66 % 

SEQ MAESVERLQQRVQELERELAOERSLQVPRSGDGGGGRVRIEKMSSEVVDSNPYSRLMALK 

SEG 

PRO ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 
MEM 



SEQ RMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCGIGKLLLFDYDKVELANMNRLFFQP 

SEG xxxxxxxxx 

PRO cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ HQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

SEG 

PRD ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 



MEM 



SEQ CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLIIPGESACFACAPPLWAANID 

SEG 

PRD cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 

MEM 

SEQ EKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 

MEM 

SEQ PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

SEG xxxxxxxxxxxxxxx. . .xxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 

MEM 

SEQ VPDLPEGITVAYTIPKKQEDSVTELTVEDSGESLEDLMAKMKNM 

SEG 

PRD ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 

MEM 



Prosite for DKFZphtes3_15c24 . 1 
7 6->105 D2 HYDROXYACID DH_1 PDOC00063 



(No Pfam data available for DKFZphtes3_15c24 . 1) 



583 



WO 01/12659 PCT/IBOO/01496 



DKFZphtes3_15c6 



group: transmembrane protein 

DKFZphtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1283 bp 

Poly A stretch at pos. 1264, no polyadenylation signal found 

1 GAGACACTGA GCCCCGAGAC AGTGAGTGGT GGCCTCACTG CTCTGCCCGG 

51 CACCCTGTCA CCTCCACTTT GCCTTGTTGG AAGTGACCCA GCCCCCTCCC 

101 CTTCCATTCT CCCACCTGTT CCCCAGGACT CACCCCAGCC CCTGCCTGCC 

151 CCTGAGGAAG AAGAGGCACT CACCACTGAG GACTTTGAGT TGCTGGATCA 

201 GGGGGAGCTG GAGCAGCTGA ATGCAGAGCT GGGCTTGGAG CCAGAGACAC 

251 CGCCAAAACC CCCTGATGCT CCACCCCTGG GGCCCGACAT CCATTCTCTG 

301 GTACAGTCAG ACCAAGAAGC TCAGGCCGTG GCAGAGCCAT GAGCCAGCCG 

351 TTGAGGAAGG AGCTGCAGGC ACAGTAGGGC TTCCTGGCTA GGAGTGTTGC 

401 TGTTTCCTCC TTTGCCTACC ACTCTGGGGT GGGGCAGTGT GTGGGGAAGC 

451 TGGCTGTCGG ATGGTAGCTA TTCCACCCTC TGCCTGCCTG CCTGCCTGCT 

501 GTCCTGGGCA TGGTGCAGTA CCTGTGCCTA GGATTGGTTT TAAATTTGTA 

5S1 AATAATTTTC CATTTGGGTT AGTGGATGTG AACAGGGCTA GGGAAGTCCT 

601 TCCCACAGCC TGCGCTTGCC TCCCTGCCTC ATCTCTATTC TCATTCCACT 

651 ATGCCCCAAG CCCTGGTGGT CTGGCCCTTT CTTTTTCCTC CTATCCTCAG 

701 GGACCTGTGC TGCTCTGCCC TCATGTCCCA CTTGGTTGTT TAGTTGAGGC 

751 ACTTTATAAT TTTTCTCTTG TCTTGTGTTC CTTTCTGCTT TATTTCCCTG 

801 CTGTGTCCTG TCCTTAGCAG CTCAACCCCA TCCTTTGCCA GCTCCTCCTA 

851 TCCCGTGGGC ACTGGCCAAG CTTTAGGGAG GCTCCTGGTC TGGGAAGTAA 

901 AGAGTAAACC TGGGGCAGTG GGTCAGGCCA GTAGTTACAC TCTTAGGTCA 

951 CTGTAGTCTG TGTAACCTTC ACTGCATCCT TGCCCCATTC AGCCCGGCCT 

1001 TTCATGATGC AGGAGAGCAG GGATCCCGCA GTACATGGCG CCAGCACTGG 

1051 AGTTGGTGAG CATGTGCTCT CTCTTGAGAT TAGGAGCTTC CTTACTGCTC 

1101 CTCTGGGTGA TCCAAGTGTA GTGGGACCCC CTACTAGGGT CAGGAAGTGG 

1151 ACACTAACAT CTGTGCAGGT GTTGACTTGA AAAATAAAGT GTTGATTGGC 

1201 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAGGGCGGCC GCTCTAGAGG 
1251 ATCCAAGCTT ACGTAAAAAA AAAAAAAAAA AAG 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 2 

ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 

1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 



584 



WO 01/12659 



PCT/IB00/01496 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 



PIR:S54250 ribosomal protein L2 
76, P - 0.33 



Arabidopsis thaliana, N - 1, Score » 



>PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana 
Length » 258 



Score - 76 (11.4 bits). Expect » 4.0e-01, P - 3.3e-01 
Identities - 30/91 (32%), Positives = 44/91 (48%) 

Query: 15 PGHGA V PV P R I GFK FVNN FP FGLV DVN RARE VL PTAC AC L PASS LFS FHY AP S PGGLALS 74 

PG GA P+ R+ F+ PF + +E+ A C P SSL+ A G L 

Sbjct: 52 PGRGA- PLARVTFRH PFRF KKQKELFVAAEVCTPVSSLYCGKKATLWGNVLP 103 

Query: 75 FSSYPQGPVLLCP HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A ++++V+ 
Sbjct: 104 LRSIPEGAVV-CNVEHHVGDRGVLARASGDYAIVI 137 

Pedant information for DKFZphtes3_15c6, frame 2 



Report for DKFZphtes3_15c6 . 2 



[LENGTH] 

(MW] 

Ipl] 

rPROSITE) 
[PROSITE) 
[PROSITEJ 
[KW] 



118 

12413.79 
7.53 

LEUCINE ZIPPER 1 
MYRISTYL 1 
ASN_GLYCOSYLATI0N 
TRANSMEMBRANE 1 



SEQ MVAI P P S AC L PACC PGHGAV P V P R I G FK FVNN FP FG LV DVNRAREVL PTACAC L P AS S LF 

PRD cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 

MEM 

SEQ S FHY APS PGGLALS FSSYPQGPVLLCPHVPLGCLVEALYN FSLVLCSFLLYFPAVSCP 

PRD eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 

MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphtes3_15c6. 2 

PS00001 100->104 ASNGLYCOSYLATION PDOC00001 
PS00008 70->76 MYRISTYL PDOC00008 

PS0OO29 84->106 LEUCINE_ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_15c6.2> 
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DKFZphtes3_15gl4 



group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR243c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 3462, no polyadenylation signal found 



1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 
51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 
101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 
151 TGATGAACAG GGACAGTTAG TTAATAAGAC CATCGATGAG CCTATTTTCA 
201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 
251 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 
301 TACTTTGATT AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 
351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 
401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 
4 51 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 
501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 
551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 
€01 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 
651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 
701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 
751 AGACCACAGA AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 
801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 
851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 
901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 
951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 
1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 
1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 
1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 
1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 
1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 
1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 
1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATTTGGGA AGGGAAGGAA 
1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 
1401 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 
1451 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 
1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 
1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 
1601 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 
1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 
1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 
1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 
1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 
1851 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 
1901 CTACAGACAT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 
1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 
2001 TGGAAGATCA TGACATTGAT GTCAAAACGA AAGGTTCCCA CATTGATGAA 
2051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 
2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 
2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 
2201 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT GATAGATACC 
2251 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 
2301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 
2351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 
2401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 
2451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AGACAATTAC 
2501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 
2551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 
2601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 
2651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 
2701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 
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2751 AAAGCAGTAT GGAGCATTAT ATATCAGTAA TGTGATATAT ATACTTAAGC 
2801 CAGTTTAACC ATTTTGGGAA ATGTTAGCAT TAGGAAATAA AATCCAAAAG 
2851 AAGGAAGAGA AGCTATATGC AATGCAAAAT TTGCTTATTG CAATATTTTC 
2901 ATATACAGAC ACTAAAAACA GTTTTCAAAG TCCAGCATTA CGTAACTAAA 
2951 GTAAGTAAAA TGATGTGTAT CAACTTGATG GTAAAATATG TAGTTATTTA 
3001 AAAAAGCAAT GAACAATTTA GTTTCATGAG AAAATGTTGC CCCCTAAAAG 
3051 TAGAACACAT ATGTTACAAC TGCAATAATA CTCTGAATTC ATCTTTCACA 
3101 AATAAGAGAC ATGTTAGCAT AGTGATTAAA AGCACAGATA TTGGAGACAA 
3151 ACTAACCCAG TTTGAACCCT GGCACTGCCA CGTATAGCAC TGCAGCCTTG 
3201 GGAAAGTTAT TTAAACTCAT GGGCTTCAGT TTCAACATCT GTAAAATGGG 
3251 CATGTTAACA TTGCCTACCT CATAGGATTA CTGTGAGAAT TTTCTAAGTT 
3301 AATATATGTA AAGCAACTTT AAAAAGTGCC TGGCACTTAG TTATTGTTAA 
3351 GTAAGTGTCT GCAGATGCAA GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 
3401 TCCCTTCCTG TTAAGATGAA AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 
3451 CGGCCGCTCA AGATGAAAAA AAAAAAAAAA AAAAAAAAAA AAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 



1 MEEDTDYRIR FSSLCFFNDH VGFHGTIKSS PSDFIVIEID EQGQLVNKTI 
51 DEPIFKISEI QLEPNNFPKK PKLDLQNLSL EDGRNQEVKT LIKYTDGDQN 
101 HQSGSEKEDT IVDGTSKCEE KADVLSSFLD EKTHELLNNF ACDVREKWLS 
151 KTELIGLPPE FSIGRILDKN QRASLHSAIR QKFPFLVTVG KNSEIVVKPN 
201 LEYKELCHLV SEEEAFDFFK YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 
251 KKFGNLVETK SFSKMNCSAG NPNVVVTVRF REKAHKRGKR PLSECQEGKV 
301 I YTAFTLRKE NLEMFEAIGF LAIKLGVIPS DFSYAGLKDK KAITYQAMVV 
351 RKVTPERLKN IEKEIEKKRM NVFNIRSVDD SLRLGQLKGN KFDIVIRNLK 
401 KQINDSANLR ERIMEAIENV KKKGFVNYYG PQRFGKGRKV HTDQIGLALL 
451 KNEMMKAIKL FLTPEDLDDP VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 
501 LLEALHRFGM TEEGCIQAWF SLPHSMRI FY VHAYTSKIWN EAVSYRLETY 
551 GARVVQGDLV CLDEDIDDEN FPNSKIHLVT EEEGSANMYA IHQWLPVLG 
601 YNIQYPKNKV GQWYHDILSR DGLQTCRFKV PTLKLN1PGC YRQILKHPCN 
651 LSYQLMEDHD IDVKTKGSHI DETALSLLIS FDLDASCYAT VCLKEIMKHD 
701 V 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15gl4, frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 . 09" ; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosmid., H - 3, Score - 511, P » 2.9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces 
cerevisiae), N - 2, Score - 516, P =• 7.3e-54 

SWISSPROT:YQ4B_CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.ll IN 
CHROMOSOME V., N - 2, Score -386, P - 2.1e-34 



>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length - 676 

HSPs: 

Score - 516 (77.4 bits), Expect - 7.3e-54, Sum P(2) - 7.3e-54 
Identities - 151/498 (30%), Positives - 245/498 (49%) 

Query: 191 KNSEI VVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 249 
+ E V P L +L + EE+ Y A K + F+ +K R +H + 
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Strict: 109 RRQEFNVDPELR-NQLVEIFGEEDVLKIESVYRTANKMETAKNFE— DKSVRTKIHQLL 164 

Query: 250 NKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHK-RGKRPLSECQEG-KVIYTAFTL 307 

+ F N +E+ + N +EK ++ R + G + FTL 

SbjCt: 165 REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 224 

Query: 308 RKENLEMFEAIGFLAIKLGVIPSD-FSYAGLKDKKAITYQAMVVRKVTPERLKNIEKEIE 366 

KEN + EA+ + KL +PS YAG KD++A+T Q + + K+ +RL + + + 
Sbjct: 225 HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 282 

Query: 367 KKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENVKKKGFV 426 

K M + N D SL LG LFGN F +VIR++ N +L E + +++ + GF+ 
SbjCt: 28 3 -KGMI IGNYNFSDASLNLGDLKGNEFVVVIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 340 

Query: 427 NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNR-AKKYFLQTEDAK 485 

NY+G QRFG + T IG LL + KA +L L+ +D P ++ A+K + +T+DA 
SbjCt: 341 NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 399 

Query: 486 GTL.SLMPEFKVRERALLEALHRFGMTEEGCIQ— AWFS LPHSMRIFYVHAYTSKIW 539 

L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 

SbjCt: 400 LALKQMPRQCLAENALLYSLSNQRKEEDGTYSENAYYTAIMKIPRNLRTMYVHAYQSYVW 459 

Query: 540 NEAVSYRLETYGARVVQGDLVC LDEDIDDENFPNS KIHLVTEEEGS 585 

N S R+E +G ++V GDLV L IDDE+F + VT+E+ 

SbjCt: 460 NSIASKRIELHGLKLWGDLVIDTSEKSPLISGIDDEDFDEDVREAQFIRAKAVTQEDID 519 

Query: 586 ANMYAIHQVVLPVLGYNIQYPKNK-VGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQI 644 

+ Y + VVLP G+++ YP N+ + Q Y DIL D + + ++ G YR + 

SbjCt: 520 SVKYTMEDVVLPSPG FDVLYPSNEELKQLYVDILKADNMDPFNMRRKVRDFSLAGSYRTV 579 

Query: 64 5 LKHPCNLSYQLMEDHDI DVKTKGSH I D 671 

+ + P +L Y+++ D + .. + +D 
SbjCt: 580 IQKPKSLEYRIIHYDDPSQQLVNTDLD 606 

Score ■ 86 (12.9 bits). Expect - 3-2e-01, Sum P(2> - 2.8e-01 
Identities = 40/160 (25%), Positives - 77/160 (481) 

Query: 22 GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPKLDLQNLSLE 81 

GF G IK +DF+V EID++G++++ T D+ FK+ + +P K +++ + S E 

SbjCt: 55 GFRGQIKQRYTDFLVNEIDQEGKVIHLT-DKG-FKMPK— KPQR— SKEEVNAEKES-E 106 

Query: 82 DGRNQEVHTLIKYTDGDQNHQSGS--EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 138 

R QE + D + +Q +ED + ++ + K + + F D+ ++ 

SbjCt: 107 AARRQEFNV DPELRNQLVEIFGEEDVLKIESVYRTANKMETAKN FEDKSVRTKIH 161 

Query: 139 KFACDVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSAIRQ 181 

+RE + ++ E + FIR ++N R + I Q 

Sbjct: 162 QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 

Score - 58 (8.7 bits), Expect - 7.3e-54, Sum P(2) - 7.3e-54 
Identities - 10/23 (43%), Positives - 17/23 (73%) 

Query: 67 6 SLLISFDLDASCYATVCLKEIMK 698 

++++ F L S YAT + L+E+MK 
Sbjct: 638 AWLKFQLGTSAYATMALRELMK 660 

Pedant information for DKFZphtes3_15gl4, frame 2 



Report for DKFZphtes3_15gl4 .2 

(LENGTH] 701 

(MW] 80700.96 

(PI] 7 -31 . . 

[HOMOL] PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 2e- 
51 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YOR243c] 8e-53 

(BLOCKS] BL01266C 

(BLOCKS] BL01268B 

{ BLOCKS] BL01268A 

(SUPFAM] hypothetical protein HI0701 3e-06 

(PROSITE] MYRISTYL 7 

(PROSITE) AMIDATION 2 

(PROSITE | CAMP_PHOSPHO_SITE 1 

( PROSITE) CK2 PHOSPHO_SITE 16 

( PROSITE J TYR~PH0SPHO SITE 1 

[ PROSITE) PKC PHOSPHO~SITE 13 

[PROSITE) ASN~GLYCOSYLATION 5 

[KW] Alpha_Beta 
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SEQ ■ MEEDTDYRIRFSSLCFFNDHVGFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEI 

PRO ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKLDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFSIGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhececccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIWKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhcccpcceeeecccccch 

SEQ HRKAVHHFVNKKFGHLVETKSFSKMNCSAGNPHWVTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ IYTAFTLRKENLEMFEAIGFLAIKLGVIPSDFSYAGLKDKKAITYQAMVVRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ IEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhhceeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEDAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRI FYVHAYTSKIWN 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARVVQGDLVCLDEDIDDENFPNSKIHLVTEEEGSANMYAIHQVVLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeccccccccccccceecccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ IDVKTKGSHIDETALSLLISFDLDASCYATVCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_15gl4 .2 



PS00O01 
PS00001 
PS0OO01 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS0OOO5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS0O0O5 
PS00005 
PS00006 
PS0OO06 
PS0OO06 
PS0OO06 
PS0OO06 
PS0OOO6 
PS00006 
PS0OO06 
PS0O006 
PS0OOO6 
PS00006 
PS00006 
PS0O006 
PS00006 
PS00006 
PS0O006 
PS00007 
PS0O008 
PS0O008 
PS0O008 



47->51 
77->81 
266->270 
404->408 
650->654 
351->355 
26->29 
105->108 

115- >118 
232->235 
237->240 
277->280 
306->309 
381->384 
525->528 
535->538 
544->547 
625->628 
632->635 

30->34 
49->53 
79->83 
95->99 
103->107 
105->109 
110->X14 

116- >120 
127->131 
150->154 
211->215 
237->241 
377->381 
463->467 
580->584 
668->672 
537->546 

25->31 
43->49 
114->120 



AS N_GL YCOS Y LAT ION 

AS N_G LYCOS YLAT I ON 

AS N_GL YCOS Y LAT ION 

AS N_GL YCOS Y LATION 

AS N_G LYCOS YLAT ION 

C AM P_PHOS PHO_S I T E 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO~SITE 

PKC_PKOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO"SITE 

CK2_PHOSPHO_3ITE 

CK2_PHOSPHO SITE 

CK2 PHOSPHORS ITE 

CK2~PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2~PHOSPHO~SITE 

TYR_PHOSPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOCOOOOl 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



326->332 
385->391 
514->520 
622->628 
287->291 
436-M40 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 
AMI DAT I ON 



PDOC00008 
PDOCO0008 
PDOC00008 

poocooooe 

PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15gl4.2) 
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group: testes derived 

DKFZphtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cda, no EST hits 

Sequenced by GBF 

Locus: unknown 

insert length: 2277 bp 

Poly A stretch at pos. 2252, polyadenylation signal at pos. 2226 

1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 

51 TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 

101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 

151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 

201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 

251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 

301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 

351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 

401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 

4 51 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 

501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 

551 TCAGCCCATG AAACACCTCT TACACCCCAC CAAGGGAGAG CCCAAGTGGA 

601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 

651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 

701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 

751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 

801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 

851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 

901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 

951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 

1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 

1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 

1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 

1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 

1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 

1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 

1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 

1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 

14 01 CAGGCACAAG TGAAGCTGAG AGACTTCGAG TCAGCCGTGA ACAATTTTGA 

1451 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 

1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 

1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 

1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 

1651 TGAGAGATGA GCCCGAGAAG GTGGTGAAGC AGTGGGACCA TAGTGAGGAT 

1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 

1751 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 

1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 

1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 

1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 

1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 

2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 

2051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 

2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 

2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 

2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 

2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 

BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 69 bp to 2084 bp; peptide length: 672 
Category: similarity to known protein 



1 MSDPEGETLR STFPSYMAEG 
51 VARSKCFLKM GDLERSLKDA 
101 LVFYHRGYKL RPDREFRVGI 
151 QAENIKAQQK PQPMKHLLHP 
201 LEKLLLDEDL IKGTMKGGLT 
251 DRKLMQEKWL RDHKRRPSQT 
301 LKKVLEWNKE EVPNKDELVG 
351 KEYDLPDAKS RALDNIGRVF 
401 HEIGRCYLEL DQAWQAQNYG 
451 RDFESAVNNF EKALERAKLV 
501 ENLKEKSEGE ASLYEDRIIT 
551 DDEAFGEALQ SPASGKQSVE 
601 LEAGRRESRE IYRRPSGELE 
651 KTQFGEIGET KKTGNEMEKE 



ERLYLCGEFS KAAQSFSNAL YLQDGDKNCL 
EASLQSDPAF CKGILQKAET LYTMGDFEFA 
QKAQEAINNS VGSPSSIKLE NKGDLSFLSK 
TKGEPKWKAS LKSEKTVRQL LGELYVDKEY 
VEDLIMTGIN YLDTHSNFWR QQKPIYARER 
AHYILKSLED IDMLLTSGSA EGSLQKAEKV 
NLYSCIGNAQ IELGQMEAAL QSHRKDLEIA 
ARVGKFQQAI DTWEEKIPLA KTTLEKTWLF 
EKSQQCAEEE GDIEWQLNAS VLVAQAQVKL 
HNNEAQQAII SALDDANKGI IRELRKTNYV 
REKDMRRVRD EPEKVVKQWD HSEDEKETDE 
AGKARSDLGA VAKGLSGELG TRSGETGRKL 
QRLSGEFSRQ EPEELKKLSE VGRREPEELG 
YE 



BLAST P hits 



Entry AF039202_1 from database TREMBL: 

product: "Hsp70/Hsp90 organizing protein"; Cricetulus griseus 

Hsp70/Hsp90 organizing protein mRNA, complete cds. 

Score - 149, P » 5.3e-07, identities » 42/160, positives » 74/160 

Entry AI09782_1 from database TREHBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds. 

Score = 155, P = 6.1e-07, identities - 140/623, positives - 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score - 156, P - 9.7e-08, identities - 41/153, positives - 72/153 



Alert BLASTP hits for DKFZphtes3_15hl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15hl, frame 3 



Report for DKFZphtes3_15hl . 3 



(LENGTH) 672 

[MH] 76655.61 

[pi] 5.49 

[HOMOLJ PIR:S56658 stress-induced protein stil - soybean 6e-10 

[SUPFAM1 tetratricopeptide repeat homology le-07 

{PROSITEJ HYRISTYL 7 

[PROSITE] AMI DAT I ON 3 

[PROSITE) CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2 PHOSPHO_SITE 15 

[PROSITE] TYR~PHOSPHO_SITE 1 

[PROSITE] PKC""PHOSPHO SITE 11 

[PROSITE] ASN~GLYCOSYLATION 2 

[KW| All Alpha 

[KW] LOW'COMPLEXITY 4.76 % 



SEQ MSDPEGETLRST FPSYMA EG ERL Y LCGE FS KAAQS FSNALYLQDG DKNC LVARS KCFLKM 

SEG 

PRD cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 

SEQ GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 

SEG 

PRD hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 
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SEQ QKAQEAINNSVGSPSSIKLENKGDLSFLSKQAENIKAQQKPQPMKHLLH PTKGEPKWKAS 



PRD hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

SEQ LKSEKTVRQLLGELYVDKEYLEKLLLDEDLIKGTMKGGLTVEDLIMTGINYLDTHSNFWR 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

SEQ QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

SEQ LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

SEG 

PRD hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

SEQ RALDNIGRVFARVGKFQQAIDTWEEKIPLAKTTLEKTWLFHEIGRCYLELDQAWQAQNYG 



PRD hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 
SEQ EKSQQCAEEEGDIEWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAII 



PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SEQ SALDDANKGIIRELRKTNYVENLKEKSEGEASLYEDRIITREKDMRRVRDEPEKVVKQWD 

SEG x 

PRD hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 

SEQ HSEDEKETDEDDEAFGEALQSPASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

SEG xxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

SEQ LEAGRRESREI YRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

SEG 

PRD hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 



SEQ KKTGNEMEKEYE 

SEG 

PRD cccccccccccc 



Prosite for DKFZphtes3_15hl . 3 



PS000O1 


128- 


>132 


ASH GLYCOSYLATION 


PDOC00001 


PS00001 


438- 


>442 


ASH GLYCOSYLATION 


PDOC00001 


PS00004 


265- 


>269 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


605- 


>609 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


613- 


>617 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


636- 


>640 


CAMP~~PHOSPHO_SITE 


PDOC00004 


PS00005 


e 


->11 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


6E 


i->69 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


136- 


>139 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


180- 


>183 


PKC PHOSPHORS IT E 


PDOC00005 


PS00005 


183- 


>186 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


186- 


>189 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


214- 


>217 


PKC PHOSPHO SITE 


PDOC00005 


PSO0O05 


342- 


>345 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


564- 


>567 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


596- 


>599 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


660- 


>663 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00006 




2->6 


CK2~PHOSPHO SITE 


PDOC00006 


PSOO006 


66 


,->70 


CK2 PHOSPHO~SITE 


PDOC00006 


PSO0006 


93 


!->97 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


171- 


>175 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


220- 


>224 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOO06 


277- 


>281 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


382- 


>386 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


392- 


>396 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


481- 


>485 


CK2 PHOSPHO~SITE 


PDOC00006 


PSO0006 


507- 


>511 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


512- 


>516 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


542- 


>546 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


548- 


>552 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


628- 


>632 


CK2~PHOSPHO_SITE 


PDOC00006 


PS00006 


663- 


>667 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


506- 


>515 


TYR~PHOSPHO_SITE 


PDOC00007 


PSO0008 


119- 


>125 


MYRISTYL 


PDOC00008 


PS00008 


132- 


>138 


MYRISTYL 


PDOC00008 


psooooa 


213- 


>219 


MYRISTYL 


PDOC000OB 
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PS00008 268 

PS00008 320 

PS00008 334 

PS00008 590 

PS00009 596 

PS00009 603 

PS00009 641 



->294 MYRISTYL 

->326 MYRISTYL 

->340 MYRISTYL 

->596 MYRISTYL 

->600 AMI DAT I ON 

■>$01 AMI DAT I ON 

->645 AMI DAT ION 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFEphtes3_15hl . 3) 
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DKfZphtes3_15iS 



group: cell structure and motility 

DKFZphtes3 15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins . 

The novel protein is similar to the Chlaraydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63. This protein is important for the maintenance of a planar form of sperm flagellar 
beating. In addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 



strong similarity to "radial spokehead- proteins 

complete cDNA, complete cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus: unknown 

Insert length: 2478 bp 

Poly A stretch at pos . 2452, polyadenylation signal at pos. 2433 



1 CACCCTGGCC CGCTCCCCGC GCCCTCCACG GGTAACGGCC CCCTCTCTCG 
51 GTGCTCAGAA ACCGGCGGTG TCGACAGGTG GCTCTCGCTT GGCCTCCTTG 
101 TCTGCAAGCC TTTCTCCTAG AGATCTGTGC CTCCTGGCGA ACCATGGGAG 
151 ACCTGCCGCC CTACCCTGAG CGCCCTGCCC AGCAGCCTCC GGGCCGGAGG 
201 ACTTCTCAGG CCTCCCAGAG GCGGCACAGT CGGGACCAAG CTCAGGCCCT 
251 GGCAGCGGAC CCCGAGGAGA GGCAGCAGAT ACCTCCAGAC GCCCAGCGAA 
301 ACGCCCCTGG TTGGTCACAG AGGGGCAGCC TGTCCCAACA GGAGAACTTG 
351 CTGATGCCCC AGGTCTTCCA GGCTGAGGAA GCCCGGCTGG GTGGCATGGA 
401 GTACCCATCT GTGAACACGG GCTTTCCCTC AGAGTTCCAG CCTCAGCCTT 
451 ACTCTGATGA AAGCAGGATG CAGGTCGCCG AGCTCACCAC CAGCCTAATG 
501 CTGCAGCGGC TCCAGCAGGG CCAAAGCAGC CTGTTCCAGC AACTGGACCC 
551 CACCTTCCAG GAGCCCCCAG TCAACCCCTT GGGCCAGTTC AACCTCTACC 
601 AGACAGACCA GTTCTCTGAA GGTGCCCAGC ACGGGCCTTA CATAAGGGAT 
651 GACCCTGCCC TTCAGTTCTT GCCCTCTGAG CTGGGCTTCC CACACTACAG 
701 TGCCCAGGTG CCTGAGCCCG AGCCTCTGGA GCTGGCCGTG CAGAACGCCA 
751 AGGCCTACCT GCTGCAGACC AGCATCAATT GCGACCTCAG CCTGTACGAG 
801 CACCTGGTAA ATCTGCTGAC CAAGATCCTG AACCAGCGGC CTGAGGACCC 
851 CTTGTCTGTC CTGGAGTCTC TGAACCGCAC CACGCAGTGG GAGTGGTTCC 
901 ACCCCAAGCT GGACACGCTG CGGGACGACC CCGAGATGCA GCCCACCTAC 
951 AAGATGGCGG AGAAACAGAA GGCGCTGTTC ACCCGGAGTG GAGGCGGCAC 
1001 TGAAGGCGAA CAGGAGATGG AGGAGGAGGT GGGGGAGACA CCAGTGCCCA 
1051 ACATCATGGA GACTGCCTTC TACTTCGAGC AGGCCGGCGT CGGCCTGAGC 
1101 TCGGACGAGA GCTTCCGCAT TTTCCTGGCC ATGAAACAGC TGGTGGAGCA 
1151 GCAGCCCATC CACACCTGTC GCTTCTGGGG CAAGATCCTG GGAATCAAAC 
1201 GCAGCTACCT GGTGGCCGAG GTGGAATTCC GGGAGGGCGA GGAGGAGGCA 
1251 GAGGAGGAGG AGGTGGAGGA GATGACGGAA GGTGGCGAGG TCATGGAGGC 
1301 GCACGGCGAG GAGGAGGGCG AGGAGGACGA GGAGAAGGCC GTGGACATCG 
1351 TCCCTAAGTC CGTATGGAAG CCGCCGCCCG TGATCCCCAA GGAGGAGAGC 
1401 CGCTCAGGCG CCAACAAGTA CCTGTACTTT GTGTGCAACG AGCCGGGCCT 
1451 GCCATGGACG CGGCTGCCCC ACGTCACTCC AGCCCAGATC GTGAACGCCC 
1501 GAAAGATCAA GAAGTTCTTC ACAGGCTACC TGGACACGCC AGTCGTCAGC 
1551 TACCCACCCT TCCCGGGCAA CGAGGCCAAC TACCTGCGGG CCCAGATAGC 
1601 CCGCATCTCG GCCGCCACGC AGGTCAGCCC GCTGGGCTTC TACCAGTTTA 
1651 GTGAGGAGGA GGGCGACGAG GAGGAGGAAG GTGGTGCTGG GCGCGACTCC 
1701 TACGAGGAGA ACCCGGACTT CGAGGGCATC CCCGTGCTGG AGCTGGTCGA 
1751 CTCCATGGCC AACTGGGTGC ATCACACACA GCACATCCTG CCGCAGGGCC 
1801 GCTGCACTTG GGTGAACCCT TTGCAGAAGA CAGAGGAGGA GGAGGACCTG 
1851 GGGGAGGAGG AAGAGAAGGC AGATGAGGGG CCAGAGGAGG TGGAGCAGGA 
1901 GGTTGGCCCC CCACTGCTAA CGCCACTTTC AGAAGATGCA GAAATCATGC 
1951 ACCTGGCACC CTGGACCACC CGCCTGTCCT GCAGCCTCTG CCCGCAGTAC 
2001 TCAGTGGCCG TTGTGCGCTC CAACCTCTGG CCCGGGGCCT ATGCCTATGC 
2051 CAGTGGCAAA AAGTTTGAGA ACATCTACAT CGGCTGGGGT CACAAGTACA 
2101 GCCCCGAGAG CTTCAACCCG GCCCTGCCAG CCCCCATTCA ACAAGAGTAC 
2151 CCCAGTGGCC CAGAGATCAT GGAGATGAGT GACCCCACAG TGGAAGAGGA 
2201 GCAGGCTCTG AAAGCAGCCC AGGAACAAGC CCTGGGAGCC ACAGAGGAGG 
2251 AGGAGGAGGG CGAGGAGGAG GAGGAGGGCG AGGAGACAGA TGACTGAGGC 
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2301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 
2351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 
2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 
2451 GCATTAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



86251010: 

Molecular cloning and expression of flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlamydomonas flagella: polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 



Peptide information for frame 3 



G-RF from 14 4 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 



1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR OQAQALAADP EERQQIPPDA 

51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 

101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 

151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 

201 NAKAYLLQTS INCDLSLYEH LVNLLTK I LN QRPEDPLSVL ESLNRTTQWE 

251 WFHPKLDTLR DDPEMQPTYK MAEKQKALFT RSGCGTEGEQ EMEEEVGETP 

301 VPNIMETAFY FEQAGVGLSS DESFRIFLAM KQLVEQQPIH TCRFWGKILG 

351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 

401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 

451 NARKIKKFFT GYLDTPWSY PPFPGNEANY LRAQIARISA ATQVSPLGFY 

501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP VLELVDSMAN WVHHTQHILP 

551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 

601 IMHLAPWTTR LSCSLCPQYS VAWRSNLWP GAYAYASGKK FEN1 YIGWGH 

651 KYSPESFNPA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 

701 EEEEEGEEEE EGEETDD 



BLAST P hits 



Entry U73123_l from database TREMBL: 

product: "radial spokehead"; Strongylocentrotus purpuratus radial 
spokehead mRNA, complete cds . 

Score - 1604, P - 7.4e-165, identities - 303/523, positives - 395/523 
Entry B44498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score - 386, P « 3.4e-45, identities » 105/264, positives » 138/264 



Alert BLASTP hits for DKFZphtes3_15i5 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15i5, frame 3 



Report for DKFZphtes3_15i5 . 3 



t LENGTH! 717 

[MW] 80913.61 

[pi] 4.36 
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[HOMOL] TREMBL:U73123_1 product: "radial spokehead"; Strongylocentrotus purpuratus 

radial spokehead mRNA, complete cds. le-130 



[PROSITE] 
[PROSITE] 
( PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
[KM] 



TRANSFERRIN 
HYRISTYL " 5 
AMIDATION 2 
CAMP_PHOSPHO_SITE 
CK2 PHOSPHO_SITE 
TYR~PH0SPHO_SITE 
GLYCOSAMINOGLYCAN 
PKC PHOSPHO_SITE 
ASN~ GLYCOSYLATION 
All~Alpha 
LOW COMPLEXITY 



2 
14 

1 
1 



SEQ MGDLPPYPERPAQQPPGRRTSQASQRRHSRDQAQALAADPEERQQIPPDAQRNAPGWSQR 

SEG . - - • xxxxxxxxxxxx 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccccccccccccc 

SEQ GSLSQQENLLMPQVFQAEEARLGGMEYPSVNTGFPSEFQPQPYSDESRMQVAELTTSLML 

SEG xxxx 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhhhhhh 

SEQ QRLQQGQS S L FQQLDPT FQEP PVN P LGQFN L YQT DQFS EGAQHG P Y I RDDP ALQFL P S EL 

SEG xxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GFPHYSAQVPEPEPLELAVQHAKAYLLQTSIKCDLSLYEHLVNLLTKILNQRPEDPLSVL 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhccccchhhh 

SEQ ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

SEG xxxxxxxxxxxxxxxx. . 

PRD hhhchhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 

SEQ VPNIMETAFY FEQAGVG LSSDESFRI FLAMKQLVEQQP I HTC RFWGK I LG I KRSYLVAEV 

SEG xxx 

PRD ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhhhhhcccchhhhhhhh 

SEQ EFREGEEEAEEEEVEEMTEGGEVMEAHGEEEGEEDEEKAVDI VPKSVWKPPPVI PKEESR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeecccccccccccccccccc 

SEQ SGANKYLYFVCNEPGLPWTRLPHVTPAQIVNARKIKKFFTGYLDTPVVSYPPFPGNEANY 

SEG 

PRD cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhcccccccccccccccchhhh 

SEQ LRAQIARISAATQVSPLGFYQFSEEEGDEEEEGGAGRDSYEENPDFEGIPVLELVDSMAN 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecchhhh 

SEQ WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

SEQ IMHLAPWTTRLSCSLCPQYSVAWRSNLWPGAYAYASGKKFENIYIGWGHKYSPESFNPA 

SEG 

PRD cccccccccccccccccccceeeeeeccccceceecccccceeeeeeccccccccccccc 

SEQ LPAPIQQEYPSGPEIHEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

SEG xxxxxxxxxxxxxx . . . xxxxxxxxxxxxxx . . . 

PRD cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
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PS00001 


244- 


>248 


PS00002 


282- 


>286 


PS00004 


18 


->22 


PS00004 


26 


->30 


PS00005 


24 


->27 


PS00005 


58 


->61 


PS00005 


258- 


>261 


PS00005 


268- 


>271 


PS00005 


323- 


>326 


PS00005 


341- 


>344 


PS00005 


608- 


>611 


PS0O0OS 


637- 


>640 


PS00006 


64 


->68 


PS00006 


137- 


>141 



ASM GLYCOSYLATION 
GLYCOSAMINOGLYCAN 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO SITE 
PKC_PHOS PHO~S I TE 
PKC PHOSPHORS ITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO SITE 
PKC_PHOSPHO~SITE 
CK2 PHOSPHO~SITE 
CK2~PHOSPHO SITE 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC0000S 
PDOC00006 
PDOC00006 
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PS00006 


216->220 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


238->242 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


247- 


>251 


CK2~PH0SPHO~SITE 


PDOC00006 


psooooe 


258->262 


CK2~PH0SPHO~SITE 


PDOC00006 


PS00006 


286->290 


CK2 PHOSPHO 


SITE 


PDOC00006 


PSOO006 


319->323 


CK2~PH0SPH0" 


"SITE 


PDOC00006 


psooooe 


503- 


■>507 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


519->523 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


563- 


•>567 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


671- 


>675 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


682- 


■>6B6 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


700- 


■>704 


' CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


639- 


>646 


TYR~PHOSPHO~SITE 


PDOC00007 


psooooe 


284- 


>290 


MYRISTYL 


PDOC00008 


psooooe 


315->321 


MYRISTYL 




PDOC00008 


PS00008 


350- 


>356 


MYRISTYL 




PDOC00008 


psooooe 


435- 


>441 


MYRISTYL 




PDOC00008 


PS00008 


475- 


■>4B1 


MYRISTYL 




POOC00008 


PS00009 


16 


;->2o 


AMIDATION 




PDOC00009 


PS00009 


637- 


>641 


AMI DAT ION 




PDOC00009 


PS00205 


619- 


>628 


TRANSFERRIN 


_1 


PDOC00182 



(No Pfam data available for DKE*Zphtes3_15i5. 3) 
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DKFZphtes3_15jl8 
group: testes derived 

DKFZphtes3_15jl8 encodes a novel 14B amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, few EST hits 
Sequenced by GBF 
Locus: unknown 
Insert length: 905 bp 

Poly A stretch at pos. 839, polyadenylation signal at pos. 815 

1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 

51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGCAGGG CACAGCTTGT 

101 CCAGCTGGGA TGTTTGGGTG CCCTGTCAGA TGCCCCAAGC CACCAACCCA 

151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 

201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGCTTAG AATCTCTTGG 

251 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 

301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 

351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 

401 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 

451 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 

501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 

551 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 

601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 

651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 

701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 

751 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 

801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 

851 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACGCGAAAA 

901 AAAAG 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORE* from 110 bp to 553 bp; peptide length: 14fl 
Category: putative protein 



1 MFGCPVRCPK PPTQL1SGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15 jl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15j 18, frame 2 
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Report for DKFZphtes3_15jl8.2 



[LENGTH] 148 

[MWl 15665.78 

[pll 8.91 

[PROSITE1 MYRISTYL 3 

[PROSITE] CK2 PHOSPHO_SITE 1 

[KW] Irregular 



SEQ MFGCPVRCPKPPTQLISGEASAARLPAWRDVLQQPGVGGEGGLRISWQGAPKSRVRPAFI 

PRD cccccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 

SEQ SPVPFTVLQSQHYHPFSEGVGTQVECLTPVLRLESDMARTAPHPSSLHPFPAWDSSSPVH 

PRD cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 

SEQ CGAPLPSAHGGFPRARAEGSWSQPGAGS 

PRD cccccccccccccccccccccccccccc 



Prosite for DKFZphtes3_15jl8 .2 



PS00006 
PS0OOO8 
PS00008 
PS00008 



82->86 
38->44 
42->48 
49->S5 



CK2 PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00006 
PDOCOOOO0 
PDOC00008 
PDOC00008 



(No Pfara data available for DKFZphtes3_15jl8.2) 
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DKFZphtes3_15j3 



group: nucleic acid management 

DKFZphtes3_15j3 encodes a novel 743 amino acid protein with similarity to proteins with 
unknown function. 

The novel protein contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR276c, a ribonuclease H of S. cerevisiae. Thus, the protein 
seems to a new RNA-modificating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations. 



"44M2.3"; product, differences to genmodel, similarity to ribonuclease 
H 

complete cDNA, complete cds, EST hits 
YGR276c - ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

Locus: /map-"16pll .2" 

Insert length: 2695 bp 

Poly A stretch at pos. 2601, polyadenylation signal at pos. 2579 



1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 

51 GCCCGTCTTC TCTTCTGCGT TTCCCGGGCT AGGGGGCGTG GGGAGTGGTT 

101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 

151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 

201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 

251 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 

301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 

351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 

401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 

451 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 

501 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 

551 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 

601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 

651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 

701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 

751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 

801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 

651 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 

901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 

951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 

1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 

1051 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 

1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 

1151 ATACATCCAT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 

1201 CAGAAGATTT AAGCTCAAGT TCTTAGCCAA AGTTATTTTG GGGAAGGATA 

1251 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 

1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 

1351 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 

1401 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 

1451 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 

1501 GGAGACAGAT GCTGGTGAAC TTCCATCTTC CAGAAATTGT CAAACTATTA 

1551 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 

1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 

1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 

1701 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 

1751 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 

1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 

1851 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 

1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 

1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 

2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 

2051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 

2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 

2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 

2201 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 

2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 

2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC TCATGGTTCA 

2351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 

2401 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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2451 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 

2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 

2551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 

2601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information foe frame 2 



ORF from 188 bp to 2416 bp; peptide length: 74 3 
Category: similarity to known protein 



1 MEPEREGTER HPRKVRESRQ APNKLVGAAE AMKAGWDLEE SQPEAKKARL 
51 STILFTDNCE VTHDQLCELL KYAVLGKSNV PKPSWCQLFH QNHLNNVWF 
101 VLQGMSQLHF YRFYLEFGCL RKAFRHKFRL PPPSSDFLAD VVGLQTEQRA 
151 GDLPKTMEGP LPSNAKAAIN LQDDPIIQKY GSKKVGLTRC LLTKEEMRTF 
201 HFPLQGFPDC ENFLLTKCNG SIADNSPLFG LDCEMCLTSK GRELTRISLV 
251 AEGGCCVMDE LVKPENKILD YLTSFSGITK KILNPVTTKL KDVQRQLKAL 
301 LPPDAVLVGH SLDLDLRALK MIHPYVIDTS LLYVREQGRR FKLKFLAKVI 
351 LGKDIQCPDR LGHDATEDAR TILELARYFL KHGPKKIAEL NLEALANHQE 
401 IQAAGQEPKN TAEVLQHPNT SVLECLSSVG QKLLFLTRET DAGELPSSRN 
4 51 CQTIKCLSNK EVLEQARVEI PLFPFSIVQF SFKAFSPVLT EEMNKRMRIK 
501 WTEISTVYAG PFSKNCNLRA LKRLFKSFGP VQSMTFVLET RQVQRPVTEL 
551 TLOCDTLVNE LEGDSENQGS I YLSGVSETF KEQLLQEPRL FLGLEAVILP 
601 KDLKSGKQKK YCFLKFKSFG SAQQALNILT GKDWKLKGRH ALTPRHLHAW 
651 LRGLPPESTR LPGLRWPPP FEQEALQTLK LDHPKIAAWR WSRKIGKLYN 
701 SLCPGTLCL1 LLPGTKSTHG SLSGLGLMGI KEEEESAGPG LCS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15 j 3, frame 2 

TREMBL:AC004381_4 gene: "44M2.3"; product: "Unknown gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence., 
N - 2, Score - 1827, P = 2.1e-284 

TREMBL:AF0164 30_4 gene: "C05C8-5"; Caenorhabditis elegans cosmid 
C05C8., N - 2, Score - 370, P - 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharomyces 
cerevisiae), N = 2, Score - 334, P « 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPAC637 . 09" ; product: "putative 
exonuclease" ; S.pombe chromosome I cosmid c637., N - 3, Score ■ 326, P 
- 2.8e-27 

>TREMBL:AC004381 4 gene: "44M2.3"; product: "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence. 
Length - 547 



Score - 1827 {274.1 bits>, Expect = 2.1e-284, Sum P(2) - 2.1e-284 
Identities = 358/373 (95%), Positives - 358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWCLGTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDOPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 224 

AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct: 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query: 


225 


Sbjct: 


121 


Query: 


270 


Sbjct: 


181 


Query: 


330 


Sbjct: 


241 


Query; 


390 


Sbjct: 


301 


Query: 


450 


Sbjct: 


361 


Score 


- 929 


Identities • 


Query: 


538 


Sbjct: 


368 


Query: 


598 


Sbjct: 


428 


Query: 


658 


Sbjct: 


488 



NSPLFGLDCEM CLTSKGRELTRI S L VAEGGCC VM DE LVK PENKIL 269 

NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 
NSPLFGLDCEMARTTFNFSIGVLQAECLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 180 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 329 
DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPVVIDT 
DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 240 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 389 
SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 
SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 300 

LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 449 
LN LEALANHQE I QAAGQE PKNT AEV LQHPNTSVLECLDS VGQK LL FLTRET OAGELPS S R 
LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 360 

NCQT I KCLSNKEV 462 
NCQTIKCLSNKEV 
NCQT I KCLSNKEV 373 

(139.4 bits*, Expect - 2.1e-284, Sum P(2) » 2.1e-284 
• 175/179 (97%), Positives = 177/179 (98%) 

LETRQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 597 
L ++VQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 
LSNKEVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAV 427 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 657 
ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 
ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 487 

STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLC PGTLCLILLPGTK 716 
STRLPGLRWPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 
STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLC PGTLCLILLPGTK 54 6 



Pedant information for DKFZphtes3_15j3, frame 2 
Report for DKFZphtes3_15j3 . 2 



[LENGTH] 

[MWJ 

ipll 

[HOMOL1 

Chromosome 

[FUNCAT) 

[ FUNCAT J 

[ FUNCAT ) 

YGL094c) le 

[ FUNCAT) 

cerevisiae, 

(FUNCAT J 

(PROSITEJ 

tPROSITE] 

(PROSITEJ 

[PROSITE] 

(PROSITEJ 

( PROSITE] 

(PROSITE] 

[PFAM] 

[KW] 



743 

83536.58 
8.87 

TREMBL: ACO04 381 4 gene: "44M2.3"; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2, complete sequence. 0.0 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 4e-30 
99 unclassified proteins [S. cerevisiae, YLR107w] 3e-13 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

-10 

04.05.05 mrna processing (5' -end, 3* -end processing and mrna degradation) [S. 
YGL094C) le-10 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080C] 2e-10 
MYRISTYL 5 
AMI DAT I ON 1 
CK2_PHOSPHO SITE 
TYR PHOSPHO~SITE 
GLYCOSAMINOGLYCAN 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
rna recognition motif. 
Alpha_Beta 



8 

1 
1 

16 
2 

(aka RRM, RBD, or RNP domain) 



SEQ MEPEREGTERHPRKVRESRQAPNKLVGAAEAMKAGWDLEESQPEAKKARLSTI LFTDNCE 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ VTHDQLCELLKYAVLGKSNVPKPSWCQLFHQNHLNNVVVFVLQGMSQLHFYRFYLEFGCL 

PRD eehhhhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

SEQ RKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSNAKAAINLQDDPI IQKY 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

SEQ GSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIADNSPLFGLDCEMCLTSK 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

SEQ GRE LT R I S L VAEGGCC VMDELVK PENK I LDYLTS FSGITKKILN PVTT KLK DVQRQL KAL 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLRALKMIHPYVIDTSLLYVREQ^RRFTCLKFLAKVILGKDIQCPDR 

PRD hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTIKCLSNKEVLEQARVEI PLFPFSIVQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFSPVLTEEMNKRMRIKWTEISTVYAGPFSKNCNLRALKRLFKSFGPVQSMTFVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhHhheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGIKEEEESAGPGL.es 

PRD cccccccchhhhhhccccccccc 



Prosite for DKFZphtes3_15j3.2 



PS00001 


219- 


>223 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


419- 


>423 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


723- 


>727 


GLYCOS AM I NOGLYCAN 


PDOC00002 


PS0000S 


8 


I->11 


PKC PHOSPHO SITE 


PDOC00005 


PS0OO05 


182- 


>185 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


238- 


>241 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


279- 


>282 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


287- 


>290 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


447- 


>450 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


453- 


>456 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


458- 


>461 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


481- 


>484 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


579- 


>582 


PKC PHOSPHORS I TE 


PDCC00005 


PS00005 


605- 


>608 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


630- 


>633 


PKC PHOSPHO SITE 


PDOCO0005 


PS00005 


643- 


>646 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


658- 


>661 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


678- 


>681 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


692- 


>695 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


41 


->45 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


221- 


>225 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


371- 


>375 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


421- 


>425 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


458- 


>462 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


630- 


>634 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


370- 


>379 


TYR~PHOSPHO SITE 


PDOC00007 


PS00008 


27 


->33 


MYRISTYL 


PDOC0000B 


PS0OOO8 


186- 


>192 


MYRISTYL 


PDOC00008 


PS00008 


575- 


>581 


MYRISTYL 


PDOC00008 


PS00008 


714- 


>720 


MYRISTYL 


PDOC00008' 


PS00008 


720- 


>726 


MYRISTYL 


PDOC00008 


PS00009 


337- 


>341 


AM I DAT I ON 


PDOC00009 



Pfam for DKFZphtes3_15j3.2 



HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM * I YVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 

IY+ +++ +T +E+L + + F + + + +++D G+ + ++F +F++ 

Query 571 iylsgvs-etfkeqllqeprlflgleavilpkdlksgkqkkycflkfks 

HMM EEDAekAIdeMNG. .itieFmGRrlRV* 

+A+ A+ + G ++ GR + 
Query 619 FGSAQQALNILTGKDwKLKGRHALT 643 
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DKFZphtes3_15kll 



group: signal transduction 

DKFZphtes3_15kll encodes a novel 958 amino acid protein C-terminal identical with human 
KIAA0781 protein and high similarity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 
serine/threonine protein kinase active-site signature. The related murine kinase was cloned 
from the myocardium of the developing heart. 

The new protein can find application in modulation of intracellular signal pathways dependent 
on this kinase. 

KIAA0781, 5* extension 

complete cDNA, complete cds, potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /map-" 11" 
Insert length: 4668 bp 

Poly A stretch at pos. 4798, polyadenylation signal at pos. 4776 



1 GAGCAAGCGG AGCGGCCGTC GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 
51 CGCCCGCCCG CGCTCCTGTC CGCCGTGTCT AGCAGCGGGG CCCAGCATGG 
101 TCATGGCGGA TGGCCCGAGG CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 
151 TTCTACGACA TCGAGGGCAC GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 
201 GCTGGGGCGG CACCGGATCA CCAAGACGGA GGTGGCAATA AAAATAATCG 
251 ATAAGTCTCA GCTGGATGCA GTGAACCTTG AGAAAATCTA CCGAGAAGTA 
301 CAAATAATGA AAATGTTAGA CCACCCTCAC ATAATCAAAC TTTATCAGGT 
351 AATGGAGACC AAAAGTATGT TGTACCTTGT GACAGAATAT GCCAAAAATG 
401 GAGAAATTTT TGACTATCTT GCTAATCATG GCCGGTTAAA TGAGTCTGAA 
451 GCCAGGCGAA AATTCTGGCA AATCCTGTCT GCTGTTGATT ATTGTCATGG 
501 TCGGAAGATT GTGCACCGTG ACCTCAAAGC TGAAAATCTC CTGCTGGATA 
551 ACAACATGAA TATCAAAATA GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 
601 AGTGGTGAAC TGCTGGCAAC ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 
651 AGAAGTCTTT GAAGGGCAGC AGTATGAAGG ACCACAGCTG GACATCTGGA 
701 GTATGGGAGT TGTTCTTTAT GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 
751 GGACCGACTC TTCCAATTTT GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 
801 GATTCCGTAT TTCATGTCAG AAGATTGCGA GCACCTTATC CGAAGGATGT 
851 TGGTCCTAGA CCCATCCAAA CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 
901 AAATGGATGC TCATAGAAGT TCCTGTCCAG AGACCTGTTC TCTATCCACA 
951 AGAGCAAGAA AATGAGCCAT CCATCGGGGA GTTTAATGAG CAGGTTCTGC 
1001 GACTGATGCA CAGCCTTGGA ATAGATCAGC AGAAAACCAT TGAGTCTTTG 
1051 CAGAACAAGA GCTATAACCA CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 
1101 GCGCCTGAAA TCACATCGGA GCAGTTTCCC AGTGGAGCAG AGACTTGATG 
1151 GCCGCCAGCG TCGGCCTAGC ACCATTGCTG AGCAAACAGT TGCCAAGGCA 
1201 CAGACTGTGG GGCTCCCAGT GACCATGCAT TCACCGAACA TGAGGCTGCT 
1251 GCGATCTGCC CTCCTCCCCC AGGCATCCAA CGTGGAGGCC TTTTCATTTC 
1301 CAGCATCTGG CTGTCAGGCG GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 
1351 GACACTCCAA AGGTCAATGG CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 
1401 GGTGCGGAAG GGATGCCAGT CACTGCCCAG CAACATGATG GAGACCTCCA 
1451 TTGACGAAGG GCTGGAGACA GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 
1501 GCCTTTGAGG CATTTCAGTC CACACGCAGC GGGCAGAGAC GGCACACTCT 
1551 GTCAGAAGTG ACCAATCAAC TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 
1601 TCTCCATGAA TGACAGCCCC TCCCTTGACA GTGTGGACTC TGAGTATGAT 
1651 ATGGGGTCTG TTCAGAGGGA CCTGAACTTT CTGGAAGACA ACCCTTCCCT 
1701 TAAGGACATC ATGTTAGCCA ATCAGCCTTC ACCCCGCATG ACATCTCCCT 
1751 TCATAAGCCT GAGACCTACC AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 
1801 AAACGAGAGG TCCACAACAG GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 
1851 AGCATCAGAT ACCTCCCTCA CCCAGGGAAT TGTAGCATTT AGACAACATC 
1901 TTCAGAATCT GGCTAGAACC AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 
1951 TTGTTGTATG AACAAATAGG ACCGGAGGCA GACCCTAACC TGGCGCCGGC 
2001 GGCTCCTCAG CTCCAGGACC TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 
2051 CTCAGCAGCA GGAAAGCGTC TCCACTCTCC CTGCCAGCGT GCATCCCCAG 
2101 CTGTCCCCAC GGCAGAGCCT GGAGACCCAG TACCTGCAGC ACAGACTCCA 
2151 GAAGCCCAGC CTTCTGTCAA AGGCCCAGAA CACCTGTCAG CTTTATTGCA 
2201 AAGAACCACC GCGGAGCCTT GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 
2251 CAGAAGCGAC TCTTTCTTCA GAAGCAGTCT CAACTGCAGG CCTATTTTAA 
2301 TCAGATGCAG ATAGCAGAGA GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 
2351 CCCTTCCCCG CCAGGAGACT CCACCGCCTT CTCAGCAGGC CCCACCGTTC 
2401 AGCCTGACCC AGCCCCTGAG CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 
2451 GCAATACAGC CCTTTCCTCA GCCAGTACCA AGAGATGCAG CTTCAGCCCC 
2501 TGCCCTCCAC TTCCGGTCCC CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 
2551 CAGCAGCAGC AGCCGCCACC GCCACCACCC CCTCCACCAC CACGACAGCC 
2601 AGGAGCTGCC CCAGCCCCCT TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2651 CAAGCGCTGC TTCCCCTGCG CCAGACTATC CCACTCCCTG TCAGTATCCT 
2701 GTGGATGGAG CCCAGCAGAG CGACCTAACG GGGCCAGACT GTCCCAGAAG 
2751 CCCAGGACTG CAAGAGGCCC CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
2801 AGCTACCTGG ACTCTTTGAT TGTGAAATGC TAGACGCTGT GGATCCACAA 
2851 CACAACGGGT ATGTCCTGGT GAATTAGTCT CAGCACAGGA ATTGAGGTGG 
2901 GTCAGGTGAA GGAAGAGTGT ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
2951 TTTAAAGCTT ATTTTCTTGC CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
3001 CCAACTGGAA TCAGAGGGTC TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
3051 TCTGCCCCAC CACAAAGTTT TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
3101 GCTGAGGCTC CTGCCCTTCG GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
3151 CTGACAAATG TGTTCCTAAG AAGACATTCA GACCCAGGTC TTATGCAGGA 
3201 TTACATCCGT TTATTATCAA GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
3251 GTGCTATTGC ATATATATGG GGGAAAAGGC AATATATTTT TCACTGAAGC 
3301 TGAGCAACCA CATATTGCTA CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
3351 AGATGCACAG GAAATAAAGG AAAGCTGTGC TTTGTCATTG AATCCTAAGT 
3401 TCTTAGCTGC TGATGCAAGT TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
3451 GCATGAGCTG TGTTTCAGGG GCCACTAAAT AACAGCTGGT ACTGACCCCA 
3501 GAAACCGCCT TCATCTCCAT TCGGAAGCAG GTGACACACC CCTTCAGAAG 
3551 GTGCCCTGGG TTGCCGAGTG TCAGAATATA CTCAGGACTC CAGAGGTGTC 
3601 ACACGTGGAA CTGACAGGAG ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
3651 ACTCAAGAAC GCATCAAGAG CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
3701 TTCCTGCAGT TTCTCGTGGA CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
3751 GGGTACCTGT TGTCTCTTTT CCGATGTAAT AACTACTTTG ACCTTACACT 
3801 ATATGTTGCT AGTAGTTTAT TGAGCTTTGT ATATTTGGAC AGTTTCATAT 
3851 AGGGCTTAGA GATTTTAAGG ACATGATAAA TGAACTTTTC TGTCCCATGT 
3901 GAAGTGGTAG TGCGGTGCCT TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
3951 TTCTGTAGAA ACCAACAGTT TCCATTTATG TCAATGCTAA ATCCAAAGTC 
4001 ACTTCAGAGT TTGTTTTCCA CCATGTGGGA ATCAGCATTC TTAATTTCGT 
4051 TAAAGTTTTG ACTTGTAATG AAATGTTCAA GTATTACAGC AATATTCAAA 
4101 GAAAGAACCA CAGATGTGTT AACCATTTAA GCAGATCATC TGCCAAACAT 
4151 TATATTACTA ATAAAACTTA ACCAACAGTT ACAATTCAGT CATCAAAGTA 
4201 AGTAAAAATT AGATGCTACA GCTAGCTAAC TGTATCCCTA GAAATGATGA 
4251 ATAATTTGCC ATTTGGACAG TTAACATCCA GGTGTTACAA AGTCAGTGTT 
4301 AATTCTAAAG ATGATCATTT CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
4351 AGATGAATGT GTTAAGCACA AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
4 401 ACTAACTGAT GCTGCATCTA GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
4451 GTAGTTAGCG TTCAGGCAGG TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
4501 CTGGCCATGC GAGCCCAGCT CCTACCAACG TCGGTAACTT GAGCAGTCCC 
4551 TGTTGCTGGC CAGAGACTGC CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
4601 GATGCTTCGC AGAGGCACTG TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
4651 AAGGGCAGTG TGGGGACTGT CATTTTTGTG ATTTAATAAC ACACAGTGAA 
4701 AATCCAGGAA GAATGAATTA AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
4751 CGTGCTTAAG ATTGATGATT TCGTGAAATA AAGAACATCA TTTCATTTAA 
4801 AAAAAAAAAA AAAAAAAGGG CGGCCGCTCT AGAGGATCCA AGCTTACGTA 
4851 CGCGTGAAAA AAAAAAAG 



BLAST Results 



Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score - 1605, P - 1.9e-66, identities » 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens mRNA for KIAA0781 protein, partial cds . 
Score - 10725, P - O.Oe+QO, identities = 2145/2145 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 2874 bp; peptide length: 959 
Category: known protein 



1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADGPR HLQRGPVRVG 
51 FYDIEGTLGK GNFAVVKLGR HRITKTEVAI KIIDKSQLDA VNLEKIYREV 
101 QIMKMLDHPH I IKLYQVMET KSMLYLVTEY AKNGEIFDYL ANHGRLNESE 
151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNHNIKI ADFGFGNFFK 
201 SGELLATWCG SPPYAAPEVF EGQQYEGPQL DIWSMGVVLY VLVCGALPFD 
251 GPTLPILRQR VLEGRFRIPY FMSEDCEHLI RRMLVLDPSK RLTIAQIKEH 
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301 KWMLIEVPVQ RPVLYPQEQE NEPSIGEFNE QVLRLMHSLG IDQQKTIESL 
351 QNKSYNHFAA IYFLLVERLK SHRSSFPVEQ RLDGRQRRPS TIAEQTVAKA 
401 QTVGLPVTMH SPNMRLLRSA LLPQASNVEA FSFPASGCQA EAAFMEEECV 
4 51 DTPKVNGCLL DPVPPVLVRK GCQSLPSNMM ETSIDEGLET EGEAEEDPAH 
501 AFEAFQSTRS GQRRHTLSEV TNQLVVMPGA GKIFSMNDSP SLDSVDSEYD 
551 MGSVQRDLNF LEDNPSLKDI MLANQPSPRM TSPFISLRPT NPAMQALSSQ 
601 KREVHNRSPV SFREGRRASD TSLTQGIVAF RQHLQNLART KGILELNKVQ 
651 LLYEQIGPEA DPNLAPAAPQ LQDLASSCPQ EEVSQQQESV STLPASVHPQ 
701 LSPRQSLETQ YLQHRLQKPS LLSKAQNTCQ LYCKEPPRSL EQQLQEHRLQ 
751 QKRLFLQKQS QLQAYFNQMQ IAESSYPQPS QQLPLPRQET PPPSQQAPPF 
801 SLTQPLSPVL EPSSEQMQYS PFLSQYQEMQ LQPLPSTSGP RAAPPLPTQL 
851 QQQQPPPPPP PPPPRQPGAA PAPLQFSYQT CELPSAASPA POYPTPCQYP 
901 VDGAQQSDLT GPDCPRSPGL QEAPSSYDPL ALSELPGLFD CEMLDAVDPQ 
951 HNGYVLVN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15kll, frame 1 



Report for DKFZphtes3_15kll - 1 



[LENGTH] 926 

[MW] 103915.77 

tpl] 5.70 

[HOMOL] TREMBL:AB018324_1 gene: ' , KIAA0781 H ; 

mRNA for KIAA0781 protein, partial cds . 0.0 



product: "KIAA0781 protein"; Homo sapiens 



[FUNCAT] 
8e-76 
( FUNCAT ] 
{ FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT) 
3e-56 
(FUNCAT] 
{ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT } 
[ FUNCAT J 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
( FUNCAT] 
( FUNCAT ] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YPL031C] le-23 
[FUNCAT] 
le-23 
[FUNCAT] 
( FUNCAT ] 
[FUNCAT] 
[ FUNCAT] 

(S 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-19 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
4e-18 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[ FUNCAT ] 
YNL183C] 2e-14 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YDR477w] 



11.01 stress response [S. cerevisiae, YDR477w) 8e-76 

30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 8e-76 

98 classification not yet clear-cut [S. cerevisiae, YCL024w] 4e-58 
03.25 cytokinesis (S. cerevisiae, YDR507cl 3e-56 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YDR507c] 

30.02 organization of plasma membrane [S. cerevisiae, YDR122w] le-53 
03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 3e-53 

30.10 nuclear organization [S. cerevisiae, YKLlOlw] 3e-53 

99 unclassified proteins [S. cerevisiae, YPL141c] 5e-51 
03.19 recombination and dna repair [S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 
10.99 other signal- transduction activities [S. cerevisiae, YPL153c] 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
tS. cerevisiae, YPL153cJ 3e-42 

03.01 cell growth [S. cerevisiae, YFR014c] 5e-42 

03.16 dna synthesis and replication (S. cerevisiae, YMROOlc] 2e-34 
03.10 sporulation and germination (S. cerevisiae, YGL180w] le-27 
08.13 vacuolar transport [S. cerevisiae, YGL180w] le-27 

06.13.04 lysosomal and vacuolar degradation (S. cerevisiae, YGL180w] 
10.02.11 key kinases [S. cerevisiae, YBL105C] 3e-26 
04.99 other transcription activities [S. cerevisiae, YER129w] 3e-26 
02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 



3e-42 
3e-42 



le-27 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YPL031c] 



04.05.01.04 transcriptional control [S. cerevisiae, YPL031C] Xe-23 
03.13 meiosis (S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c] 8e-21 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YHL007c] 8e~21 

09.01 biogenesis of cell wall [S. cerevisiae, YPL140cl 2e-20 

10.03.11 key kinases [S. cerevisiae, YLRll3w] 7e-20 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w) 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w) 2e-18 
10.04.11 key kinases [S. cerevisiae, YLR362w) 3e-18 

04.03.99 other trna-transcription activities [S. cerevisiae, YOR061w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing] ' [S. cerevisiae, YFL033cl 4e-17 

05.07 translational control [S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 
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[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL183c) 

2e-14 

[FUNCAT) 09.04 biogenesis of cytoskeleton (S. cerevisiae, YNL020c) 5e-14 

[ FUNCAT ] c energy conversion [M. genitalium, MG109] 2e-12 

[ FUNCAT ) 30.09 organization of intracellular transport vesicles (S. cerevisiae, 

YBR097w) le-10 

(FUNCAT) 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YBR097wl 

le-10 

[ FUNCAT ) 30.08 organization of golgi [S. cerevisiae, YBR097w] le-10 

[ FUNCAT J 06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 

le-10 

[FUNCAT] 10,04.99 other nutritional-response activities [S. cerevisiae, YJR059w] 

4e-09 

[ FUNCAT J 01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis {S. 
cerevisiae, YHR079cJ le-07 

IFUNCATJ 30.07 organization of endoplasmatic reticulum (S. cerevisiae, YHR079cJ 

le-07 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YNLl54c] 2e-04 

[BLOCKS] BL00415A Synapsins proteins 

[BLOCKS] BL00239B Receptor tyrosine kinase class II proteins 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[SCOP] dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-78 

[SCOP] dlwfc 5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) le-81 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 5e-89 

[SCOP] dlkoba_ 5.1.1.1.6 Twitchin, kinase domain (California sea har 5e-86 

[SCOP] dlphk_^ 5.1.1.1.5 gamma -subunit of glycogen phosphorylase kinas 3e-80 

[SCOP] dlirk 5.1.1.2.4 insulin receptor [Human (Homo sapiens) 6e-70 

[SCOP] dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Ku le-95 

[SCOP] dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn 7e-71 

[SCOP J dlydse_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 2e-96 

(SCOP] dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 2e-72 

[SCOP] dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit (pig (Su 5e-97 

[SCOP) d2hckb3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 2e-68 

(SCOP] dlcsn 5.1.1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 3e-53 

[SCOP] dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) 3e-78 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) le-58 

[EC] 2.7.1.117 Myosin-light-chain kinase 3e-49 

[EC] 2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 4e-78 

[EC] 2.7.1.38 Phosphorylase kinase 3e-41 

[EC] 2.7.1.37 Protein kinase 7e-45 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

[EC] 2.7.1.128 [Acetyl-CoA carboxylase) kinase 4e-78 

[PIRKW] phosphotransferase 3e-93 

[PIRKW] nucleus 2e-74 ' 

[PIRKW) calcium 2e-40 

[PIRKWJ transferase 3e-33 

[PIRKW] duplication 2e-32 

(PIRKW] tandem repeat 7e-45 

[PIRKW] phorbol ester binding 4e-33 

[PIRKW] zinc 4e-33 

[PIRKW] ion transport le-32 

[PIRKW] cell cycle control le-45 

(PIRKW] serine/threonine-specif ic protein kinase 2e-97 

(PIRKW] oncogene le-34 

(PIRKW] phospholipid binding 2e-32 

[PIRKW] autophosphorylation 2e-74 

[PIRKW] brain 6e-36 

[PIRKW] heterotetramer 8e-38 

[PIRKW] mitosis le-45 

[PIRKW] polymer 5e-41 

[PIRKW] magnesium 6e-80 

[PIRKW] ATP 2e-97 

[PIRKW] ' polyprotein le-34 

[PIRKW) alternative initiators 2e-31 

[PIRKW] phosphoprotein 2e-74 

[PIRKW) apoptosis 8e-38 

(PIRKW] cGMP binding 4e-33 

[PIRKW] glycoprotein 3e-36 

[PIRKW] skeletal muscle 8e-38 

[PIRKW] protein kinase 2e-50 

[PIRKW] testis 5e-41 

[PIRKW] cAMP binding 8e-38 

[PIRKW] transforming protein 4e-33 

[PIRKW] purine nucleotide binding 7e-52 

[PIRKW] calcium binding 7e-45 

[PIRKW] alternative splicing 5e-42 

[PIRKW] P-loop 7e-52 

[PIRKW] lipoprotein 8e-38 

[PIRKW] proto-oncogene 4e-33 

[PIRKWJ segmentation le-34 

[PIRKW] core protein le-34 
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[ PIRKW] 


muscle 8e— 38 


t PIRKW] 


myristyla tion 8e-38 


[ P I RKW ] 


EF hand 7e-45 


[ PIRKW] 


cell division 3e-49 


[PIRKW] 


homod i me r 1 e - 3 2 


[ PIRKW] 


calmodulin binding 5e-42 


[SUPFAM] 


ribosomal protein S6 kinase II le-34 


[SUPFAM] 


calcium-dependent protein kinase 7e-45 


[SUPFAM J 


AMP-act ivated protein kinase 6e-80 


[ SUPFAM J 


protein kinase akt 3e-36 


[SUPFAM] 


protein kinase SPK1 7e-41 


[ SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 8e-99 


[ SUPFAM] 


Ca 2+ /calmodulin -dependent protein kinase Se — 42 


I SUPFAM] 


calmodulin repeat homology 7e-45 


[SUPFAM] 


camp receptor protein cyclic nucleotide - binding domain homology 3e— 33 


[SUPFAM] 


protein kinase DUN1 6e-36 


[ SUPFAM) 


protein kinase C zeta 4e-33 


[ SUPFAM) 


Pi ctyos tel i um cAMP— dependent protein kinase catalytic chain 2e— 34 


[SUPFAM] 


death-associated protein kinase 8e-38 


[SUPFAM] 


pleckstrin repeat homology 3e-36 


[ SUPFAM J 


ankyrin repeat homology 8e-38 


[ SUPFAM] 


protein kinase homology 8e-99 


(SUPFAM) 


Ca 2+ /calmodulin -dependent protein kinase II 6e-38 


(SUPFAM) 


protein kinase C zinc-binding repeat homology 4e-33 


(SUPFAM] 


protein kinase C delta 2e-32 


[SUPFAM] 


cGMP-dependent protein kinase 3e-33 


[SUPFAM) 


protein kinase cdrl le-45 


(SUPFAM) 


kinase-related transforming protein 2e-50 


(SUPFAM) 


Ca2+/calmodulin-dependent protein kinase I 8e-42 


I SUPFAM] 


kinase interaction domain homology 7e-41 


[SUPFAM] 


gag-akt polyprotein le-34 


(PROSITE] 


PROTEIN KINASE_ATP 1 


(PROSITE) 


MY R I STY L ~3 


(PROSITE] 


AM I DAT I ON 2 


[PROSITE) 


CAMP PH0SPHO_SITE 4 


(PROSITE) 


CK2 PHOSPHO_SITE 15 


[PROSITE] 


TYR~PHOSPHO~SITE 2 


[PROSITE) 


PRC PHOSPHO SITE 10 


[PROSITE] 


ASN~GLYCOSYLATION 2 


[PROSITE) 


PROTE I N_K I NASE_ST 1 


[ PFAM] 


Eukaryotic protein kinase domain 


[KW] 


Irregular 


[KW] 


3D 


[ KW] 


L0W_COMPLEXITY 12.31 % 



SEQ MVMADGPRHLQRGPVRVGFYDIEGTLGKGNFAWBtLGRHRITKTEVAIKIIDKSQLDAVN 

SEG 

IctpE EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 

SEQ LEKIYREVQIMKMLDHPHIIKLYQVMETKSMLYLVTEYAKNGEI FDYLANHGRLHESEAR 

SEG 

IctpE HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHH 

SEQ RKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFFKSGELLATWCGSP 

SEG 

IctpE HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC-CCCCCG 

SEQ PYAAPEVFEGQQYEGPQLDIWSMGVVLYVLVCGALPFDGPTLPILRQRVLEGRFRIPYFM 

SEG 

IctpE GGCCHHHHHCCCBC-HHHHHHHHHHHHHHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTT 

SEQ SEDCEHLIRRMLVLDPSKRLTIAQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV 

SEG 

IctpE CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG 

SEQ LRLMHSLGIDQQKTIESLQNKSYHHFAAIYFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

SEG 

IctpE 

SEQ AEQTVAKAQTVGLPVTMHSPNMRLLRSALLPQASNVEAFSFPASGCQAEAAFMEEECVDT 

SEG 

IctpE 

SEQ PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 

SEG xxxxxxxxxxx 

IctpE 

SEQ RRHTLSEVTNQLVVHPGAGKIFSMNDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 

SEG 

IctpE 
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SEQ ANQPSPRMTSPFISLRPTNPAMQALSSQKREVHNRSPVSFREGRRASDTSLTQGIVAFRQ 

*SEG 

IctpE 

SEQ HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 

SEG xxxxxxxxxxxxxxxx. . . . xxxxxxxxxxxx . 

lCtpE 

SEQ LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 

SEG xxxxxxxxxxxxx 

IctpE 

SEQ RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 

SEG xxxxxxxxxxx xxxxxxxxxxxxxxx 

IctpE 

SEQ SSEQMQYSPFLSQYQEMQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

IctpE 

SEQ PLQFSYQTCELPSAASPAPDYPTPCQYPVDGAQQSDLTGPDCPRSPGLQEAPSSYDPLAL 

SEG xxx 

IctpE 

SEQ S E L PGLFDC EML DA V DPQHNG Y VLVN 

SEG 



Prosite for DKFZphtes3_15kl 1 . 1 



PS00001 


115- 


■>119 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


320- 


>324 


AS N~G LYCOS YLAT I ON 


PDOC00001 


PS00004 


258- 


•>262 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


355- 


■>359 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


481- 


■>485 


CAMP~PHOSPHO~SITE 


PDOC00004 


PS00004 


584- 


■>588 


CAMP~PHOSPHO SITE 


PDOC00004 


PS00005 


257- 


>260 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


339- 


>342 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


420- 


>423 


PKC~PHOSPHO SITE 


PDOC00005 


PS00O05 


475- 


>478 


PKC~PHOSPHO SITE 


PDOC00005 


PS00O0S 


534- 


■>537 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


545- 


>548 


PKC~PHOSPHO SITE 


PDOCO0005 


PS00005 


554- 


>557 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


567- 


>570 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


579- 


>582 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


670- 


>673 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


42 


:->46 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54 


->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


128- 


>132 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00O06 


359- 


>363 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


>398 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


458- 


>462 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


484- 


>488 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O0O6 


503- 


>507 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


515- 


>519 


CK2 _ PHOSPHO SITE 


PDOC00006 


PS00006 


534- 


>538 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS0OC06 


878- 


>882 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


893- 


>897 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00007 


672- 


>680 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


100- 


>108 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


372- 


>37S 


MYRISTYL 


PDOC00008 


PS00008 


871- 


>877 


MYRISTYL 


PDOC00008 


PS00008 


905- 


>911 


MYRISTYL 


PDOC00008 


PS00009 


134- 


>138 


AMIDATION 


PDOC00009 


PS00009 


582- 


>586 


AMIDATION 


PDOC00009 


PS00107 


26 


»->50 


PROTEIN KINASE ATP 


PDOC00100 


PS00108 


138- 


>151 


PROTEIN KINASE ST 


PDOC00100 



Pfam for DKFZphtes3_15kll.l 



HMM_NAME Eukaryotic protein kinase domain 



610 



WO 01/12659 



HMM *YeigRiIGeGsFGtVYkCiWr.TGeIVAIKIIk)crsms FIRE I 

Y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAVVKLGRHRITKTEVAIKI IDKSQLDAVNLEKIYREV 

HMM qlMRrLnHPNIIRFYDwFedddDHIYMIMEYMeGGDLFDYI r rngpMsEw 

QIM++L+HP+II++Y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 
Query 69 QIMKMLDHPHIIKLYQVME-TKSMLYLVTEYAKNGEIFDYLANHGRLNES 

HMM elr f IMyQILrGMeYLHSMgllHRDLKPENILIDeNgqIKIcDFGLARqM 

E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRK1VHRDLKAENLLLDKNMNIKIADFGFGHFF 

HMM nnYerMttf CGTPWYMMAPEVI Img . nyYttkVDMWSFGCILWEMMTGep 

+++E++ T CG+P+Y APEV +G +Y +++ D+WS+G++L+ +++G + 
Query 168 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGWLYVLVCGAL 

HMM PFyddnMemlmrliqrfrrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF++ ++ + + +++ R+++++ +SE++ +++R+++ +DP+KR+T+ QI 
Query 216 PFDGPTLPILRQRVLEGRFRIPYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 



PCT/IBOO/01496 

68 
117 
167 
215 
265 
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DKFZphtes3__17flO 
group: testes derived 

DKFZphtes3_15jlB encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to neurofilament proteins 

Sequenced by GBF 

Locus: unknown 

Insert length: 2533 bp 

Poly A stretch at pos. 2507, no polyadenylation signal found 

1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 
51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 

101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 

151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 

201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 

251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 

301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 

351 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 

401 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 

451 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 

501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 

551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 

601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 

651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 

701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 

751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 

801 GCACAAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 

851 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 

901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 

951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 
1001 TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 
1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 
1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 
1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 
1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 
1251 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 
1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 
1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 
1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 
14 51 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 
1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 
1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 
1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 
1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 
1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 
1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 
1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 
1851 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 
1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 
1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 
2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 
2051 CCTTAAAGAT TTGTCTAATA CCAATGATGG ACAGGCTCCC ACTCTTGAAA 
2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 
2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 
2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 
2251 ATTAAAGGGG ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 
2301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 
2351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 
2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 
2451 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 
2501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 18 bp to 2147 bp; peptide length: "710 
Category: similarity to known protein 
Classification: unclassified 



1 MDRSQQTSRT GYWTMMNIPP VEKVDKEQQT YFSESEIVVI SRPDSSSTKS 
51 KEDALKHKSS GKIFASEHPE FQPATNSNEE IGQKNISRTS FTQETKKGPP 
101 VLLEDELREE VTVPVVQEGS AVKKVASAEI EPPSTEKFPA KIQPPLVEEA 
151 TAKAEPRPAE ETHVQVQPST EETPDAEAAT AVAENSVKVQ PPPAEEAPLV 
201 EFPAEIQPPS AEESPSVELL AEILPPSAEE SPSEEPPAEI LPPPAEKSPS 
251 VELLGEIRSP SAQKAPIEVQ PLPAEGALEE APAKVEPPTV EETLAEVQPL 
301 LPEEAPREEA RELQLSTAME TPAEEAPTEF QSPLPKETTA EEASAEIQLL 
351 AATEPPADET PAEARSPLSE ETSAEEAHAE VQSPLAEETT AEEASAEIOL 
401 LAAIEAPADE TPAEAQSPLS EETSAEEAPA EVQSPSAKGV SIEEAPLELQ 
451 PPSGEETTAE EASAAIQLLA ATEASAEEAP AEVQPPPAEE APAEVQPPPA 
501 EEAPAEVQPP PAEEAPAEVQ PPPAEEAPAE VQPPPAEEAP AEVQPPPAEE 
551 APSEVQPPPA EEAPAEVQSL PAEETPIEET LAAVHSPPAD DVPAEEASVD 
601 KHSPPADLLL TEEFPIGEAS AEVSPPPSEQ TPEDEALVEN VSTEFQSPQV 
651 AGIPAVKLGS VVLEGEAKFE EVSKINSVLK DLSNTNDGQA PTLEIESVFH 
701 IELKQRPPEL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17f 10, frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N - 1, Score - 480, P 
= 7.4e-43 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3" end., N 
= 1, Score ° 475, P - le-42 



>PIR:A37221 neurofilament triplet H protein 
Length - 1,072 



Score - 480 (72.0 bits), Expect - 7.4e-43, P - 7.4e-43 
Identities -.185/622 (29%), Positives - 320/622 (51%) 



Query: 


33 


Sbjct: 


436 


Query: 


93 


Sbjct: 


496 


Query: 


153 


Sbjct: 


555 


Query: 


212 


Sbjct: 


610 


Query: 


269 


Sbjct: 


670 


Query: 


328 


Sbjct: 


722 


Query: 


384 



++EE 



V+ P+T ++P 



E G + + TS 



+ A K + AE + P+ K PA+++ P ++ A 



+ A A++ +V+ P 



P++ +SP E + PAE 



KSP+ V+ E +SP+ K+P+ 



A + PA+ ++PAEA+SP+ E S E+A + V+ 
-AEAKSPAEAKSPAEAKSPV-EVKSPEKAKSPVKEGAK 775 



384 PLAEETTAEEASAEIQLLAA1EAPAD-ET PAEAQSPLSEET-SAEEAPA- EVQSPSAKGV 440 
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LAE + E+A + ++ 1+ PA+ ++P +A+SP+ EE S E+A +V+SP AK 
Sbjct: 776 SLAEAKSPERARSPVK--EEIKPPAEVKSPEKAKSPMKEEARSPERAKTLDVKSPEAKTP 833 



Query: 


441 


SIEEA— PLELQPPSGEETTA-EEASAAIQLLAATEASA— -EEAPAEVQPPPAEEAPAE 


4 94 




+ EEA P +++ P ++ A EEA + + TE A EE + V+ A+E P + 




Sbjct: 


834 


AKEEAKRPADIRSPEQVKSPAKEEAKSPEKEETRTEKVAPKKEEVKSPVEEVKAKEPPRK 


893 


Query: 


495 


VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 


553 




V+ P EV+ +EAP E Q P AEE + P +++P E + EEA 




Sbjct: 


894 


VEEEKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP— KOSPGEAKK— EEAKE 


948 


Query: 


554 


EVQPPPAEEAPAEV QSLP— AEETPIEETL— AAVHSPPADDVPAEEASVD-KHS 


603 




+ P EE PA++ ' ++ P AE+ +E + P ++VPA D K 




Sbjct: 


949 


KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 


1008 


Query: 


604 


PPADLLLTEEFPIGEASAEVSPP--PSEQT-PEDEALVENVSTEFQSPQ 649 






+ EE P +A A+ P E + P+ E ++ ST+ + Q 




Sbjct: 


1009 


KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDQKDSQ 1057 




Score 


- 473 


(71.0 bits), Expect - 4.8e-42, P - 4.8e-42 




Identities - 184/628 (29%), Positives - 310/628 (49%) 




Query: 


18 


IPPVEKVDKEQQTYFSESEIVVISRP DSSSTKSKEDALKHKSSGKI FASEHPEFQPA 


74 




I VEK +KE ++E + ++ + E+ + + G+ A+ P + A 




Sbjct: 


440 


IKWEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 


499 


Query: 


75 


TNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGSAVKRVASAEIEPPS 


134 




+ +E + + + + KP E+ E P + AK + AE + P+ 




Sbjct: 


500 


AS PEKET- KSP VKEEAKS PAEAKS PA EARS PAEAKS PAEVKS PAEVK-S PAEAKS PA 


554 


Query: 


135 


TERFPAKIQPPLVEEATAKAEPRPAEETHVQVQ-PSTEETPDAEAATAVAENSVKVQPPP 


193 






K PA+++ P ++ A+A+ ++ +V+ P+T ++P + A A++ +V+ P 




Sbjct: 


555 


EARS PAEVKSP AT VKS PAEAKS PAEAKS PAEVKSP AT VKSPGEAKS PAEAKS PAEVKS PV 


614 


Query: 


194 


AEEAPL-VEFPAEIQPPSAEESPS-VELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS 


250 




++P + PA ++ P +SP+ + AE+ P+ +SP E + PAE+ P KSP+ 




Sbjct: 


615 


EAKSPAEAKSPASVKSPGEAKS PAEAKS PAEVKSPATVKSPVEAKSPAEVRSPVTVKSPA 


674 


Query: 


251 


-VELLGEIRSPSAQKAPIEVQ-PLPAEGALE-EAPARVEPPTVEETLAEVQPLLPEEAPR 


307 




+ E++SP++ R+P E + P A+ E ++P + P ++ AE +P ++P 




Sbjct: 


675 


EAKSPVEVKSPASVKSPSEAKSPAGAKSPAEAKSPVVAKSPAEARSPAEAKPPAEAKSPA 


734 


Query: 


308 


EEARELQLSTAME--TPAE-EAPTEFQSP LP-KE TTAEEASAEIQLLAATE-- 


354 




E + + E +PAE ++P E +SP P KE + AE S E E 




Sbjct: 


735 


EARS PAEAKSPAEAKS PAEAKS PVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEI 


794 


Query: 


355 


-PPAD-ET PAEARSPLSEET-SAEEAHA-EVQSPLAEETTAEEAS — AEIQLLAAIEAPA 


408 




PPA+ ++P +A+SP+ EE S E+A +V+SP A+ EEA A+I+ +++PA 




Sbjct: 


795 


KPPAEVKSPEKAKS PMKEEAKS PEKAKTLDVKSP EAKTPAKE EAKRP AD I RS PEQVKS PA 


854 


Query: 


409 


DETPAEAQSPLSEETSAEE-APA--EVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAA 


4 65 




E EA+SP EET E+- AP EV+SP +EE + +PP E EE + A 




Sbjct: 


855 


KE EAKSPEKEETRTEKVAPKKEEVKSP VEEVKAK-EPPKKVE EEKTPA 


901 


Query: 


466 


IQLLAATEASAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAE 


525 






E+ +EAP E Q P AEE + P +++P E + A+E A P E 




Sbjct: 


902 


TPKTEVKESKKDEAPKEAQKPRAEEKEPLTEKP--RDSPGEAKKEEAKEKRAAA PEE 


956 


Query: 


526 


EAPAEV QppPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETL 


581 




E PA++ + P EtA P++ PSE + P EE PA + +E E+ 




Sbjct: 


957 


ETPAKLGVREEAKPKEKAEDAKAKEPSK--PSEKEKPRKEEVPAAPEKKDTKEEKTTESK 


1014 


Query: 


592 


AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 636 






P EE DR P TE+ ++ + PSE+ PED+A 




Sbjct: 


1015 


RPEEKPRMQARAKEE DRGLPQEPSRPKTEKAERSSSTDQKDSQPSERAPEDKA 1067 


Score 


- 421 


(63.2 bits). Expect - 3.7e-36, P - 3.7e-36 




Identities » 162/540 (30%), Positives - 275/540 (50%) 




Query: 


135 


TEKFPAKIQPPLVEEATAKAEPR PAEETHVQVQPSTEETPDAEAATAVAENSVKV 


169 




TE P RI P + K+E + +E+ V V+ TEE E T E + 




Sbjct: 


419 


TEGLP-KI-PSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEVTE— EEDKEA 


474 


Query: 


190 


QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE— SPSE-EPPAEILPPPAE 


246 




Q EEA A P AEE+ S E E P EE SP+E + PAE P 




Sbjct: 


475 


QGEEEEEAEEGGEEAATTSPPAEEAASPE--KETKSPVREEARSPAEAKS PAEAKS PAEA 


532 



Query: 247 KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

RSP+ E++SP+ K+P E + PAE ++PA+V+ P ++ AE + ++P 

Sbjct: 533 KSPA EV KS PAE VKS P AEAKS - PAEA— KS PAEVKS PAT VKS PAEAKS PAEAKS P 583 
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Query: 307 REEARELQLSTAME--TPAE-EAPTEFQSPLPKETTAEEAS-AEIQLLAATEPPAD-ETP 361 

E + + E +PAE ++P E +SP+ ++ AE S A ++ + PA+ + + P 

Sbjct: S84 AEVKSPATVKSPGEAKSPAEAKSPAEVKSPVEAKSPAEAKSPASVKSPGEAKSPAEAKSP 643 

Query: 362 AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPAD-BTPAEAQSPL. 419 

AE +SP + ++ E ++ AEV+SP+ ++ AE A + ++ +++PA ++P+EA+SP 
Sbjct: 644 AEVKSPATVKSPVEAKSPAEVKSPVTVKSPAE-AKSPVE VKSPASVKSPSEAKSP- 697 

Query: 420 SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 478 

+ ++PAE +SP AK + ++P E +PP+ ++ AE S A A + A A+ 
Sbjct: 698 AGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPAEAKSPAE— AKSPAEAK- 749 

Query: 479 APAEVQPPPAEEAPAEVQPPPAEEAP-- AEVQPPPAEEAPA--EVQPPPAEEAPAEVQPP 534 

+PAE + P ++P ++PEA AE+P ++P E++PP ++P + + P 
Sbjct: 750 SPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 809 

Query: 53 S PAEEAPAEVQPPPAEEAPSEVQPPPAEEA--PAEVQSLPAEETPIEETLAAVHSPPADDV 592 

EEA + + + E + P EEA PA+++S ++P +E SP ++ 

Sbjct: 810 MKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKEE— AKSPEKEET 866 

Query: 593 PAEEASVDKHS— PPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQV 650 

E++K P + ++EP + E P++T E++ EQP+ 

Sbjct: 867 RTEKVAPKKEEVKSPVEEVKAKEPP--KKVEEEKTPATPKTEVKESKKDEAPKEAQKPKA 924 

Query: 651 AGIPAVKLGSVVLEGEAKFEEVSK 674 

+ GEAR EE + 

Sbjct: 925 EEKEPLTEKPKDSPGEAKKEEAKE 948 

Score - 406 (60.9 bits), Expect » 1.7e-34, P - l,7e-34 
Identities = 123/390 (31%), Positives - 213/390 (54%) 

Query: 308 EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA EA 364 

E+ E+Q++ E EE E Q +E AEE E AT PPA+E + E 
Sbjct: 455 EQTEE I QVT EEVT EEEDK EAQGE - - EEE EAEEGGEEA ATTS P P AE E AAS PE K ET 506 

Query: 365 RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 
Sbjct: 507 KSPVKEEAKSP AEAKSPAEAKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKSPAEVK 563 

Query: 423 TSAE-EAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 4 BO 

+ A + + PAE +SP+ AK + + + P ++ P GE + EA + ++ + EA ++P 
Sbjct: 564 SPATVKSPAEAKSPAEAKSPAEVKSPATVKSP-GEAKSPAEAKS PAEVKSPVEA---KSP 619 

Query: 481 AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 540 

AE + P + ++P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 
Sbjct: 620 AEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPAEAKSP 679 

Query: 541 AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPAD-DVPAEEASV 599 

EV+ P + ++PSE + P ++PAE +S ++P E A PPA+ PAE S 
Sbjct: 680 VEVKSPASVKSPSEAKSPAGAKSPAEAKSPWAKSPAEAKS PAEAKPPAEAKSPAEAKSP 739 

Query: 600 DKHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQS PQVAG1PAVKLG 659 

+ PA+ E ++ EV P ++P E ++++ E +SP+ A P VK 

Sbjct: 740 AEAKSPAEAKSPAE— AKSPVEVKSPEKAKSPVKEG-AKSLA-EAKSPEKAKSP-VK-E 792 

Query: 660 SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 

+ E K E +K S +K+ + + + +A TL+++S 
Sbjct: 793 EIKPPAEVKSPEKAK— SPMKEEAKSPE-KAKTLDVKS 827 

Score - 255 (38.3 bits), Expect - 5.5e-18, P - 5.5e-18 
Identities - 124/420 (29%), Positives - 199/420 (47%) 

Query: 252 ELLG E I RS P S AQKAP I EVQPL PA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

ELLG + I+ A +A + + A AL E A++E TV+ TL + 

Sbjct: 236 ELLGQ I QGCGAAQAQAQAEARDAL KCDVTSALREI RAQLEGHTVQS T LQS EE WFRVRLDR 295 

Query: 307 REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 366 

EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 

Sbjct: 296 LSEAAKVN-TDAMRSAQEEI-TEYRRQLQARTT ELEALKSTKESLERQRSELED 347 

Query: 367 PLSEE-TSAEEAHAEVQSPLAEETTAEEASA—EIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+ S ++A ++ + L TEA+ EQL++ D E A + EE 
Sbjct: 348 RHQVDMASYQDAIQQLDNEL-RNTKWEMAAQLREYQDLLNVKMALDIEIAAYRKLLEGEE 406 

Query: 423 TSAEEAPAEV — QSPS-AKGVSIE-EAPLELQPPSGEETT-AEEASAAIQLLA-A 471 

p+ + PS + + ++ E +++ S +ET EE + IQ+ 

Sbjct: 407 CRIGFGPSPFSLTEGLPKIPSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEV 466 

Query: 472 T EAS AEEA P AEVQ PP P AEEAPAE VQP — P PAEEA PA EVQPPPAEEA- - PAEVQPPPA 524 

TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 

Sbjct: 467 TEEEDKEAQGE-EEEEAEEGGEEAATTSPPAEEAASPEKETKSPVKEEAKSPAEAKSPAE 525 
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Query: 


525 


EEAPAEVQPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAE-ETPIE-ETLA 


582 




++PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 




Sbjct: 


526 


AKSPAEAKS PAEVKSPAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 


584 


Query: 


583 


AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPP5EQTP-EDEALVENV 


641 




V SP PES + PA++ E ++ AE PS ++P E ++ E 




Sbjct: 


585 


EVKSPATVKSPGEAKSPAEAKSPAEVKSPVE AKSPAEAKSPASVKSPGEAKSPAEAK 


641 


Query: 


642 


S-TEFQSPQVAGIP 654 








S E +SP P 




Sbjct: 


642 


SPAEVKSPATVKSP 655 




Score 


- 253 


(38.0 bits), Expect » 9.0e-18, P - 9.0e-18 





Identities - 115/364 (31%), Positives - 166/364 (45%) 



Query: 110 EVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAE-ETHVQVQ- 167 

E PVV + A K + AE +PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 
Sbjct: 705 EAKSPWAKSPAEAK-SPAEAKPPAEAKSPAEAKSPAEAKSPAEAKS-PAEAKSPVEVKS 762 

Query: 168 PSTEETPDAEAATAVAE— NSVKVQPPPAEEA—PL-VEFPAEIQPPSAEE— SPSVELL 220 

P ++P E A ++AE + K + P EE P V+ P + + P EE SP 
Sbjct: 763 PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSPMKEEAKSPEKAKT 822 

Query: 221 AEILPPSAEESPSEEP-- PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE-- 275 

++ P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
Sbjct: 823 LDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKE— EAKSPEKEETRTEKVAPKKEEVK 87 9 

Query: 276 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+EE AK P VEE E P P+ +E ++ A + AEE P TE 

Sbjct: 880 SPVEEVKAKEPPKKVEE— EKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE--ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

Sbjct: 937 SPGEAKKEEAKEK KAAA--PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 A-EEASAEIQLLAAIEAPADETPAEAQSPLSEETSAEEAPAEVQSPSA-KGVSIEEAPLE 448 

EE A + E E+ + P + + EE Q PS K E+ + 

Sbjct: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 449 LQPPSGEETTAEEASAA 4 65 

Q S A E AA 

Sbjct: 1052 DQKOSQPSEKAPEDKAA 1068 



Pedant information for DKFZphtes3_17f 10, frame 3 



Report for DKPZphtes3_17fl0.3 



[LENGTH] 710 

[MW] 75131.94 

[pi] 4.02 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 34.09 % 



SEQ MDRSQQTSRTGYWTMMNIPPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRD cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKIFASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPWQEGS 

SEG 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

SEG xxxxxxxxxxx 

PRD chhhhhhhccccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG xxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccccccchhhhhhhhhhc 

SEQ LP EEA PREEARE LQLSTAMET PAEEA PTE FQS PL PK ETT AEEASAE I QLLAAT EPPADET 

SEG xxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAEEAHAEVQSPLAEETTAEEASAEIQLLAAIEAPADETPAEAQSPLS 

SEG xxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxx 

PHD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx XXXXXXXX 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDV PAEEASVD 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQS PQVAGIPAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIESVFHIELKQRPPEL 



PRD eeeehhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_17f 10 . 3) 
(No Pfam data available for DKFZphtes3_17f L0 . 3) 
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DKFZphtes3_17117 



group: metabolism 

DKFZphtes3_mi7 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1). 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) . It is a new testis- 
specific transketolase. Transketolase requires thiamin pyrophosphate as cofactor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) and R- 
CHOH-CO-CH < 2 ) OH . 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 



strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 



Sequenced by GBF 
Locus : unknown 



Insert length: 2688 bp 

Poly A stretch at pos. 2649, polyadenylation signal at pos. 2630 



1 GACAAAAGAG AGATGATGGC 

51 GCAGGTGCTG CGGGACACAG 

101 CCACGTGTGC CTCTGGTTCT 

151 GAGGTCGTGT CTGTCCTCTT 

201 CCCAGAACAC CCGGACAACG 

251 CTCCTATCCT CTATGCTGCT 

301 GACTTGCTGA ACCTGAGGAA 

351 CCCGCGATTG CCGTTTGTTG 

401 TAGGTACTGC ATGTGGAATG 

451 AGCTACCGGG TGTTCTGCCT 

501 TGTGTGGGAG GCTTTTGCTT 

551 TGGCGGTCTT CGACGTGAAC 

601 GAGCATGGCG CAGACATCTA 

651 TACTTACTTA GTGGATGGCC 

701 GGCAAGCAAG TCAAGTGAAG 

751 TTCAAAGGTC GGGGTATTCC 

801 AAAGCCAGTG CCAAAAGAAA 

851 GTCAGATACA GACCAATGAG 

901 TCACCTCAAA TAAGCATCAC 

951 CAAAGTTGGT GACAAGATAG 

1001 CTAAACTGGG CCGTGCAAAT 

1051 ATGAACTCCA CCTTTTCTGA 

1101 CATAGAGTGT ATTATTGCTG 

1151 GTGCTACACG TGGTCGAACC 

1201 TTTACTAGAG CATTCGATCA 

1251 TATCAACCTT ATTGGTTCCC 

1301 TCTCCCAGAT GGCCCTGGAG 

1351 TGTACTGTTT TCTATCCAAG 

1401 TCTAGCCGCC AATACCAAGG 

1451 AAACTGCAGT TATTTATACC 

1501 AAGGTGGTCC GCCACGGTGT 

1551 AGTTACTCTC CATGAAGCCT 

1601 GTATTTCTGT CCGTGTCATC 

1651 GCCACCATCA TCTCCAGTGC 

1701 GGAGGATCAC TACAGGGAAG 

1751 TCTCCAGGGA GCCTGATATC 

1801 CCTCAACGTG GGAAAACTAG 

1851 CAGACACATT ATAGCAGCCG 

1901 TTATTTCTAA AAAGTCAAGT 

1951 CTTTGTATTA AATTCATGTT 

2001 ACAGTTGTAC TGTTTCTTTT 

2051 TCCTAATTTG GAAATTAAAG 

2101 TTACTCTGAG TTATTAATGT 

2151 AAATAAAACA ACTACCTAAT 

2201 TGACTGAGCT GGGGATTAAA 

2251 ATTTCCTTGT AAGTTAAAAA 

2301 CCAAGTTTTG AAGGATGTTT 

2351 AGTTTTACAG ATAATGTTTG 

2401 TTTGCCTTCA TCTCTCCTCT 

2451 ACATCTCTTG ATGCACCACA 



CAACGACGCC AAGCCCGACG TGAAGACCGT 
CCAACCGCCT GCGGATCCAT TCCATCAGGG 
GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 
CTTCCACACG ATGAAGTATA AACAGACAGA 
ACCGGTTCAT CCTCTCCAGG GGACATGCTG 
TGGGTGGAGG TGGGTGACAT CAGTGAATCT 
ACTTCACAGC GACTTGGAGA GACACCCTAC 
ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 
GCTTATACTG GCAAGTACCT TGACAAGGCC 
TATGGGAGAT GGCGAATCCT CAGAAGGCTC 
TTGCCTCCCA CTACAACTTG GACAATCTCG 
CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 
CCAGAATTGC TGTGAAGCCT TTGGATGGAA 
ATGATGTGGA GGCCTTGTGC CAAGCATTTT 
AACAAGCCTA CTGCTATAGT TGCCAAGACC 
AAATATTGAG GATGCAGAAA ATTGGCATGG 
GAGCAGATGC AATTGTCAAA TTAATTGAGA 
AATCTCATAC CAAAATCGCC TGTGGAAGAC 
AGATATAAAA ATGACCTCCC CACCTGCTTA 
CTACTCAGAA AACATATGGT TTGGCTCTGG 
GAAAGAGTTA TTGTTCTGAG TGGTGACACG 
GATATTCAGG AAAGAACACC CTGAGCGTTT 
AACAAAACAT GGTAAGTGTG GCACTAGGCT 
ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 
GCTCCGAATG GGAGCCATTT CTCAAGCCAA 
ACTGTGGGGT ATCCACTGGA GAAGATGGAG 
GATCTAGCCA TGTTCCGAAG CATTCCCAAT 
TGATGCCATC TCGACAGAGC ATGCTATTTA 
GAATGTGCTT CATTCGAACC AGCCAACCAG 
CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 
CAATGATAAA GTCACAGTAA TTGGAGCTGG 
TAGAAGCTGC TGACCATCTT TCTCAACAAG 
GACCCATTTA CCATTAAACC CCTGGATGCC 
AAAAGCCACA GGCGGCCGAG TTATCACAGT 
GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 
CTTGTTCATC AACTGGCAGT GTCAGGAGTG 
TGAATTGCTG GATATGTTTG GAATCAGTAC 
TAACACTTAC TTTAATGAAG TAAACTAGGC 
CTATTGGCTT TGGCCCAAAA GCACTGGTAT 
TATTGTCACA AAACCATTAT TTATACCTAT 
AAAGCAAAGC CATTTAACAT CTTTCTTCAT 
TTTACCTTTC TGTTAATCTA TGTATAAATG 
GGATTTTAAA ATTGTAAGCA ATAGAATAGG 
ACAAATATTT CTGATAAGAC TACAAATATC 
GTAGAGGTAA CTGTATCTTA AATGAGTATG 
AATTGAAATT TAATTGTAGA CTTCAATAGT 
GAGCTTTTGT ATAATGCCAT TTATACCTGC 
ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 
ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 
CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2501 TAACTGGTTC TAGTTTGCAC ACTACACACA TAGTTTTGTG AAGCTTCAGA 

2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 

2601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 

2651 GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

Yl mouse adrenocortical tumor cells. 
99123875: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase. 



Peptide information for frame 1 



ORF from 13 tap to 1890 bp; peptide length: 626 
Category: strong similarity to known protein 
Classification: Metabolism 
Prosite motifs: AT P_GT P_A (595-603) 



1 MMANDAKPDV KTVQVLRDTA NRLRIHSIRA TCASGSGQLT SCCSAAEWS 
51 VLFFHTMKYK QTDPEHPDND RFILSRGHAA PILYAAWVEV GDISESDLLN 
101 LRKLHSDLER HPTPRLPFVD VATGSLGQGL GTACGMAYTG KYLDKASYRV 
151 FCLMGDGESS EGSVWEAFAF ASHYNLDNLV AVFDVNRLGQ SGPAPLEHGA 
201 DIYQNCCEAF GWNTYLVDGH DVEALCQAFW QASQVKNKPT AIVAKTFKGR 
251 GIPNIEDAEN WHGKPVPKER ADAIVKLIES QIQTNENLIP KSPVEDSPQI 
301 SITDIKMTSP PAYKVGDKIA TQKTYGLALA KLGRANERVI VLSGDTMNST 
351 FSEI FRKEHP ERFIECIIAE QNMVSVALGC ATRGRTIAFA GAFAAFFTRA 
401 FDQLRMGAIS QANINLIGSH CGVSTGEDGV SQMALEDLAM FRSIPNCTVF 
4 51 Y PS DA 1ST EH AIYLAANTKG MCFIRTSQPE TAVIYTPQEN FEIGQAKVVR 
501 HGVNDKVTVI GAGVTLHEAL EAADHLSQQG ISVRVIDPFT IKPLDAATI I 
551 SSAKATGGRV ITVEDHYREG GIGEAVCAAV SREPDILVHQ LAVSGVPQRG 
601 KTSELLDMFG ISTRHIIAAV TLTLMK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17117, frame 1 

SWISSPROT:TKT MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) <P68>., N = 1, 
Score - 2222, "P - 2.5e-230 

SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK) . , N - 1, Score - 
2202, P - 3.3e-228 

TREMBL:RN09256_1 product: "transketolase"; Rattus norvegicus 
Sprague-Dawley transketolase mRNA, complete cds., N - 1, Score - 2202, 
P - 3.3e-228 

SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK)., N - 1, Score - 
2200, P - 5.3e-228 



>SWISSPROT : TKT_MOUSE TRANSKETOLASE- (EC 2.2.1.1) (TK) (P68). 
Length - 623 

HSPs: 

Score - 2222 (333.4 bits), Expect - 2.5e-230, P « 2.5e-230 
Identities « 417/614 (67%), Positives - 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEH 66 
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Sbjct 



6 



KPO + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 
KPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRN 



65 



Query: 67 PDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVDVATGSL 126 

P NDRF+LS+GHAAPILYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 
Sbjct: 66 PHNDRFVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSL 125 

Query: 127 GQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFAFASHYNLDNLVAVFDVN 186 

GQGLG ACGMAYTGKY DKASYRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 
Sbjct: 126 GQG LGAACGMA YTG K Y FDKAS YRV YCMLGDG E VS EGS VW EAMAFAG I Y K LDN L VAI FDI N 185 

Query: 187 RLGQSG PAP LEHGA D I YQNCC EA FGWNT YLV DGH DVEALCQA FWQASQV KNKPTAI VAKT 24 6 

RLGQS PAPL+H DIYQ CEAFGW+T +V0GH. VE LC+AF QA K++PTAI+AKT 
Sbjct: 186 RLGQS D P A P LQHQV D I YQKRC EA FGWHT 1 1 VDGHS V EE LC KA FGQA KHQPTAIIAKT 242 

Query: 247 FKGRGIPNIEDAENWHGKPVPKERADAI VKLIESQIQTNENLIPKSPVEDSPQISITDIK 306 

FKGRGI I ED E WHGKP+PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1 + 
Sbjct: 243 FKGRGITGIEDKEAWHGKPLPKNMAEQI IQEIYSQVQSKKKILATPPQEDAPSVDIANIR 302 

Query: 307 MTSPPAYKVGDKI ATQKTYGLALAKLGRANERVI VLSGDTMNSTFSEI FRKEH PERFI EC 366 

M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GOT NSTFSE+F+KEHP+RFIEC 
Sbjct: 303 MPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSELFKKEHPDRFIEC 362 

Query: 367 1 I AEQNMVS VALGCAT RGRT I A FAGAFAAFFTRAFDQL RMGAI SQAN I N L I GS HCGVSTG 426 

IAEQNMVS+A+GCATR RT+ F FAAFFTRAFDQ+RM AIS++NINL GSHCGVS G 
Sbjct: 363 YI AEQNMV S I AVGCAT RDRTV P FC S T FAAF FTRAFDQ I RMAA I SESN I N LCGS HCGVS I G 422 

Query: 427 EDGVSQMALEDLAMFRSIPNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPETAVIYT 486 

EDG SQMALEDLAMFRS+P TV FY PSD ++TE A+ LAANTKG+CFIRTS+PE A+IY+ 
Sbjct: 423 EDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYS 482 

Query: 487 PQEN FE I GQAKWRHG VN DKVT V IGAGVTLHEALEAADH L SQQG I SVRVIDPFTIKPLDA 546 

E+F++GQAKW +D+VTVIGAGVTLHEAL AA+ L + IS+RV+DPFTIKPLD 
Sbjct: 483 NNEDFQVGQAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDR 542 

Query: 547 ATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRGKTSELL 606 

1+ SA+AT GR++TVEDHY EGG1GEAV AAV EP + V +LAVS VP+ GK +ELL 
Sbjct: 543 KLILDSARATKGRILTVEOHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELL 602 

Query: 607 DMFGISTRHIIAAV 620 

MFGI 1+ AV 
Sbjct: 603 KMFGIDKDAIVQAV 616 



Pedant information for DKFZphtes3_17117, frame 1 



Report for DKFZphtes3_17117.1 



fHOHOLJ 
(FUNCAT] 
[ FUNCATJ 
t FUNCAT ] 
[FUNCAT] 
(FUNCAT) 
[FUNCATJ 
[ FUNCAT ) 
[FUNCAT] 
[ FUNCAT) 
2e-05 



{LENGTH] 
[MW] 



[pl] 



626 

67877.52 
5.90 

SWISSPROT:TKT_MOUSE TRANS KETOLASE (EC 2,2.1.1) (TK) (P68). 0.0 

m outer membrane and cell wall [M. jannaschii, MJ0681] 3e-48 

g carbohydrate metabolism and transport [H. influenzae, HI1023] 9e-36 

01.05.01 carbohydrate utilization [S. cerevisiae, YPR074c] 5e-32 

30.03 organization of cytoplasm {S. cerevisiae, YPR074c] 5e-32 

02.07 pentose-phosphate pathway (S. cerevisiae, YPR074c] 5e-32 

01.01.01 amino-acid biosynthesis [S. cerevisiae, YPR074c] 5e-32 

i lipid metabolism [H. influenzae, HI1439) 3e-17 

c energy conversion [H. influenzae, HI1233] 2e-09 

02.01 glycolysis [S. cerevisiae, YBR221c PDB1 - pyruvate dehydrogenase} 



[FUNCAT] 
dehydrogenase] 



(PIRKW) 
(PIRKW] 
I PIRKW] 
[ PIRKW] 



[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
(SCOP) 



( EC] 
1 EC] 
(EC) 
(EC] 



30.16 mitochondrial organization (S. cerevisiae, YBR221c PDB1 - pyruvate 

2e-05 

BL00801F 

BL00801E 

BL008010 Transketolase proteins 
BL00801C Transketolase proteins 
BL00801B Transketolase proteins 
BL00801A Transketolase proteins 

dltrka2 3.28.1.2.1 Transketolase Transketolase, C-termdnal domai le-21 

1.2.4.1 Pyruvate dehydrogenase (lipoamide) 8e-ll 

1.2.4.4 3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 

2.2.1.1 Transketolase 0.0 

2.2.1.3 Formaldehyde transketolase le-20 

transferase 0.0 

flavoprotein 2e-07 

Calvin cycle le-40 

heterotetramer 2e-07 
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(PIRKW] pentose phosphate pathway 0.0 

[PIRKW] magnesium le-40 

(PIRKW) thiamine pyrophosphate 0.0 

[PIRKW] oxidoreductase 7e-12 

[PIRKW] fatty acid biosynthesis 4e-10 

[PIRKW] mitochondrion 2e-07 

[PIRKW] peroxisome le-20 

[PIRKW] homodimer le-40 

[SUPFAM) pyruvate dehydrogenase (lipoamide) alpha chain le-06 

[SUPFAMJ pyruvate dehydrogenase (lipoamide) beta chain 7e-12 

[SUPFAMJ ferredoxin 2 [ 4 Fe-4S ] -related protein 8e-47 

[SUPFAMJ thiamine pyrophosphate-binding domain homology 0.0 

[SUPFAMJ pyruvate dehydrogenase (lipoamide) 6e-08 

[SUPFAM] ferredoxin 2[4Fe-4Sj homology 8e-47 

[SUPFAM] hypothetical protein C2814 2e-21 

[SUPFAM] transketolase 0.0 

[PROSITE] ATP_GTP_A 1 

[PFAM] Transketolase 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 3.04 % 

SEQ MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYK 

SEG 

IngsB HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 



SEQ QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVD 

SEG 

IngsB TTTTTTTTTCEEEETTGGGHHHHHHHHHHHCTTCHHHHHTTTTTTTTTTTTTTTTTTTTC 

SEQ VATGS LGQGLGT ACGMAYTGKYL DKAS Y RV FC LMGDGES SEGS VWEAFA FAS H YNL DKLV 

SEG 

IngsB CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTTEE 

SEQ AVFDVNRLGQSGPAPLEHGADIYQNCCEAFGWHTYLVDGHDVEALCQAFWQASQVKNKPT 

SEG 

IngsB EEEEECCEETTEEGGGCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

SEQ AIVAKTFKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQI 

SEG 

IngsB EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHH 

SEQ SITDIKMTSPPAYKVGDKI ATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHP 

SEG 

IngsB HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHHHHTTTTTEEEEETTTHHHHCCTTCEECCG 

SEQ ERFIECIIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSH 

SEG xxxxxxxxxxxxxxxxxxx 

IngsB GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGHHHHHHHHHHCTTTEEEEEC 

SEQ CGVSTGEDGVSQMALEDLAMFRSIPNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPE 

SEG 

IngsB CCGGGTTTTTTTTCCHHHHHHHCTTTTEEECCCCHHHHHHHHHHHTTTTCEEEECCCCCB 

SEQ TAVIYTPQENFEIGQAKWRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFT 

SEG 

IngsB CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE 

SEQ IK PLDAATI I S S AKATGGRV I TVE DH Y REGG I G EA VC AAVS RE P D I LVHQ LAVSGV PQRG 

SEG 

IngsB 

SEQ KTSELLDMFGI STRHI I AAVTLTLMK 

SEG 



Prosite for DKFZphtes3_17117 . 1 
PS00017 595->603 ATP_GTP_A PDOC00017 



Pfam for DKF2phtes3_17117 . 1 
HMM_NAME Transketolase 

HMM *vNtIRiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 
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+N++RI ++ A + +SG ++++++A++ VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEHPD 68 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ W + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 1X7 

HHM PGVEVTTGPLGQGIaNaVWMAI AERnLAATYNRPGFDI f DHYTYCFMGDG 
++ +V+TG+LGQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV - OVATGSLGQG LG TACGMAYTGKYLDKASYRVFCLMGDG 157 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSIDGdTdlWFqEDtYakRF 
+ +EG++WEA ++A+H++L+W++A +D NR++++G++++ + D+Y+ + 
Query 158 ES S EGS VW EA FAFASH YN LDN L VAV FDVNRLGQS G P APLEHGAD I YQNCC 207 

HMM EAYGWHVIEVEnDGHDvEelcaAIEeAKaekDRPTLIiCRTVIGYGSPNk 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T++G+G+PN 
Query 208 EAFGWNTYLV--DGHDVEALCQAFWQASQVKNKPTAIVAKTFKGRGIPNI 255 

HMM QGTHdWHGAPLGeD* 

++ + WHG+P +++ 
Query 256 EDAENWHGKPVPKE 269 

HMM *PqWePnddkIATRKASQqaLeaiGPaLPEfWGGSADLTPSNLTrKKGmv 
P++++ +DKIAT K+++ AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKE 358 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGI AlHGgNFRPYGGT 

+ + R+I++ I+E++M++++ G+A++G+ ++++ G 

Query 359 H PERFI ECI I AEQNMVSVALGCATRGR-TI AFAGA 392 

HMM FMMFyDYARPAIRHAALMelPVIWVWTHDSIGLGEDGPTHQPVEHLAHFR 
F++F+++A++++RM A++ +++++++H++++ GEDG +++++E+LA+FR 
Query 393 FAAFFTRAFDQLRMGAISQANINLIGSHCGVSTGEDGVSQMALEDLAMFR 4 42 

HMM alPNMsVWRPCDgNETayAWylAvERehTPtiLILSRQNLPQIErNPrqf 

+IPN +V++P+D+ T+ A YLA+++++ +++++S ++ +++++ P + 
Query 4 43 SIPNCTVFYPSDAISTEHAIYLAANTKGM-CFIRTSQPETAVIYT-PQEN 490 

HMM ekvaRGGYVLkDmdnePDVILIATGSEMELAvaAAKlLadEGIkaRWSM 
++++++++V + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 
Query 491 FEIGQAKWRHGVN — DKVTVIGAGVTLHEALEAADHLSQQGISVRVI DP 538 

HMM PCTeWFD kQDeEYRe SVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + ++++R +++DH++ +++++++V ++ +++ + 

Query 539 FTIKPLDAATIISSAKATGGRVITVEDHYR-EGGIGEAVCAAVSREPDIL 587 

HMM Gal f GMNr FGESSGKAPpevLYkMFGFTPENI * 

+ +++ +++ ++ +L+ MFG+ +1 

Query 588 VHQLAVSGVPQR GKTSELLDMFGISTRHI 616 
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DKFZphtes3_17nl2 



group: transcription factors 

DKFZphtes3_17nl2 . 1 encodes a novel B04 amino acid protein which is nearly identical to mouse 
and trout SOX-LZ. 

Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context. 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper , 

The new protein can find application in modulating/blocking the expression of SOX-controlled 
genes • 



nearly identical to mouse SOX-LZ 

complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ, involved in spermatogenesis 

Sequenced by GBF 

Locus: unknown 

Insert length: 2802 bp 

Poly A stretch at pos. 2692, polyadenylation signal at pos. 2660 



1 GGGATAGGAA AGATGAAAGG TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
51 GTTGTCTCAT GTAACAATAG TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
101 CAGCCCCTAA GTCAGGTGAT GGAACAAATA CCTACAGTTT AGTCAGGTGA 
151 AACAGGAGTG GGTGGAGGAA GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
201 CAAGCAAGCC ACCTCTCCAT TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
251 TGACCCAGGA TTTAACCTCA AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
301 GTGGCCTCCC ATCTGCCTCT GCACCCCATA ATGCACAACA AACCTCACTC 
351 TGAGGAGCTA CCAACACTTG TCAGTACCAT TCAACAAGAT GCTGACTGGG 
401 ACAGCGTTCT GTCATCTCAG CAAAGAATGG AATCAGAGAA TAATAAGTTA 
4 51 TGTTCCCTAT ATTCCTTCCG AAATACCTCT ACCTCACCAC ATAAGCCTGA 
501 CGAAGGGAGT CGGGACCGTG AGATAATGAC CAGTGTTACT TTTGGAACCC 
551 CAGAGCGCCG CAAAGGGAGT CTTGCCGATG TGGTGGACAC ACTGAAACAG 
601 AAGAAGCTTG AGGAAATGAC TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
651 GGAAAAACTA CTTTCAAAAG ATTGGAAGGA AAAAATGGAA AGACTAAATA 
701 CCAGTGAACT TCTTGGAGAA ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
751 AAAGAACGGC AGCTCTCCAC CATGATTACC CAGCTGATCA GTTTACGGGA 
801 GCAGCTACTG GCAGCGCATG ATGAACAGAA AAAACTGGCA GCGTCACAAA 
851 TTGAGAAACA ACGGCAGCAA ATGGACCTTG CTCGCCAACA GCAAGAACAG 
901 ATTGCGAGAC AACAGCAGCA ACTTCTGCAA CAGCAGCACA AAATTAATCT 
951 CCTGCAGCAA CAGATCCAGG TTCAGGGTCA CATGCCTCCG CTCATGATCC 
1001 CAATTTTTCC ACATGACCAG CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
1051 CAGGGATTCC TCTTCCCCCC TGGAATAACA TACAAACCAG GTGATAACTA 
1101 CCCCGTACAG TTCATTCCAT CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
1151 TCAGCCCTTT ACAGCTCCAG CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
1201 CAGGTGTCAC CTGGAGCAAA GATGCCATCA ACTCCACAGC CACCAAACAC 
1251 AGCAGGGACG GTCTCACCTA CTGGGATAAA AAATGAAAAG AGAGGGACCA 
1301 GCCCTGTAAC TCAAGTTAAG GATGAAGCAG CAGCACAGCC TCTGAATCTC 
1351 TCATCCCGAC CCAAGACAGC AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
1401 CCAGAACCTC TTCCCAGCCA GCAAAACCAG CCCTGTCAAT CTGCCAAACA 
1451 AAAGCAGCAT CCCTAGCCCC ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
1501 TTAGGTAAAT GGAAAAGTCA ACACCAGGAA GAGACTTACG AATTAGATAT 
1551 CCTATCTAGT CTCAACTCCC CTGCCCTTTT TGGGGATCAG GATACAGTGA 
1601 TGAAAGCCAT TCAGGAGGCG CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
1651 CAACAGCAGC AACAGCCACA TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
1701 TAATATGGGG CTGAACAGCT GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
1751 AGAATTTGGG GCCCCAGTTA ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
1801 GGCCCAGGTG TCATCGACCT TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
1851 AGCAATGAAT GGCTCTGCAG CTAAACTACA GCAGTATTAT TGTTGGCCAA 
1901 CAGGAGGTGC CACTGTGGCT GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
1951 CGTGCCAGCA GCGAGCCACA CATTAAGCGA CCAATGAATG CATTCATGGT 
2001 TTGGGCAAAG GATGAGAGGA GAAAAATCCT TCAGGCCTTC CCCGACATGC 
2051 ATAACTCCAA CATTAGCAAA ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
2101 AACCAGGAGA AGCAACCTTA TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
2151 CCACTTAGAG AAGTACCCAA ACTATAAATA CAAACCCCGA CCGAAACGCA 
2201 CCTGCATTGT TGATGGCAAA AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
2251 ATGAGGTCTC GGAGACAGGA GATGAGGCAG TTCTTTACTG TGGGGCAACA 
2301 GCCTCAGATT CCAATCACCA CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
2351 TCACTATGGC AACTACCACA CCATCGCCTC AGATGACATC TGACTGCTCT 
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2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 



AGCACCTCGG 
TCGTATGAAG 
GAGAGGATGA 
GACTATAGCA 
AGTTTTTGTT 
ACAAAGAGTT 
AAAAAAAAAA 
AAAAAAAAAA 
AA 



CCAGCCCGGA 
ACAGATGGCG 
AATGGAAATG 
GTGAAAATGA 
TGCTGAATTA 
ATTAAAGAGC 
AAAAAAAAAA 
AAAAAAAAAA 



GCCCAGCCTC 
GAAGCCTAGC 
TATGATGACT 
AGCCCCGGAG 
AAGTACTCTG 
CCGCATGCAT 
AAAAAAAAAA 
AAAAAAAAAA 



CCGGTCATCC 
TGGAAATGAA 
ATGAAGATGA 
GCTGTCAGTG 
ACATTTCACC 
TTGTGGCTCC 
AAAAAAAAAA 
AAAAAAAAAA 



AGAGCACTTA 
ATGATCAATG 
CCCCAAATCA 
CCAACTGAGG 
CCCCTCCCCA 
ACAATTAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



95311974: 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine 2ipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 



Peptide information for frame 1 



ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 



1 MGRMSSKQAT 
51 HNKPHSEELP 
101 SPHKPDEGSR 
151 EDSSCMEKLL 
201 LISLREQLLA 
251 QHKINLLQQQ 
301 KPGDNYPVQF 
351 PQPPNTAGTV 
401 SPTSPTQNLF 
451 TYELDILSSL 
501 KLSSINNMGL 
551 DAEGSKAMNG 
601 MNAFMVWAKD 
651 ARLSKIHLEK 
701 FTVGQQPQIP 
751 VIQSTYGMKT 
801 VSAN 



SPFACAADGE 
TLVSTIQQDA 
DREIMTSVTF 
SKDWKEKMER 
AHDEQKKLAA 
IQVQGHMPPL 
IPSTMAAAAA 
SPTGIKNEKR 
PASKTSPVNL 
NSPALFGDQD 
NSCRNEKERT 
SAAKLOQYYC 
ERRKILQAFP 
YPNYKYKPRP 
1TTGTGVVYP 
DGGSLAGNEM 



DAMTQDLTSR 
DWDSVLSSQQ 
GTPERRKGSL 
LNTSELLGEI 
SQIEKQRQQM 
MIPIFPHDQR 
SGLSPLQLQQ 
GTSPVTQVKD 
PNKSSIPSPI 
TVMKAIQEAR 
RFENLGPQLT 
WPTGGATVAE 
DMHNSNTSKI 
KRTCIVDGKK 
GAITMATTTP 
INGEOEMEMY 



EKEEGSDQHV 
RMESENNKLC 
ADVVDTLKQK 
KGTPESLAEK 
DLARQQQEQI 
TLAAAAAAQQ 
LYAAQLASMQ 
EAAAQPLNLS 
GGSLGRGSSL 
KMREQIQREQ 
GKSNEDGKLG 
ARVYRDARGR 
LGSRWKSMSN 
LRIGEYKQLM 
SPQMTSDCSS 
DDYEDDPKSD 



ASHLPLHPIM 
SLYSFRNTST 
KLEEMTRTEQ 
ERQLSTMITQ 
ARQQQQLLQQ 
GFLFP PGITY 
VSPGAKMPST 
SRPKTAEPVK 
GKWKSQHQEE 
QQQQPHGVDG 
PGVIDLTRPE 
ASSEPKIKRP 
QEKQPYYEEQ 
RSRRQEMRQF 
TSASPEPSLP 
YSSENEAPEA 



BLASTP hits 
Entry MMSOXLZ2_l from database TREMBL: 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 

Score - 3910, P - 0.0e+00, identities - 764/801, positives - 774/801 

Entry 151083 from database PIR; 
SOX-LZ - rainbow trout 

Score - 1774, P - l.le-287, identities - 365/532, positives - 431/532 

Entry S59121 from database PIR: 
S0X6 protein - mouse 

Score ■ 2319, P - l,2e-240, identities • 489/660, positives - 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSoxSL"; product: "SOX5"; Mus musculus mSoxSL mRNA, complete 
cds. 

Score - 1212, P - 8.9e-209, identities = 274/457, positives - 324/457 
Entry MMU010604_1 from database TREMBL: 

gene: "sox5 H ; product: "L-Sox5 protein"; Mus musculus mRNA for 
transcription factor L-Sox5 

Score - 879, P - 4.2e-195, identities - 190/281, positives - 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2, frame 1 

Report for DKFZphtes3_17nl2 . 1 



[LENGTH) 
(KW) 
tpl] 
[HOMOLJ 

[FUNCAT} 

[FUNCAT] 

[FUNCAT] 

cerevisiae, 

[ FUNCAT] 

7e-06 

[FUNCAT] 

I FUNCAT J 

[FUNCAT] 

[FUNCAT] 

I SCOP] 

[SCOP J 

| SCOP} 

(PIRKW] 

tPIRKW] 

( PIRKW] 

{ PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM) 

[SUPFAMt 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE) 

[ PROSITE J 

[PROSITE] 

[PROSITE] 

[ PFAM) 

(KW) 

IKW] 

[KW] 

[KW] 



604 

89332.69 
6.97 

TREMBL : MMSOXLZ2_l product: 



'SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 0.0 



04.05.01.04 transcriptional control (S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization [S. cerevisiae, YKL032c] 8e-07 
01.07.07 regulation of vitamins, cofactors/ and prosthetic groups [S. 
YPR06 5w} 5e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR089c-a] 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a) 7e-06 

03.01 cell growth [S. cerevisiae, YBR089c-a) 7e-06 

03.16 dna synthesis and replication [S. cerevisiae, YMR072w) 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072wJ 2e-04 

dlhmf 1.20.1.1.1 HMG1, fragments A and B [rat/hamster (Rattu le-13 

dllefa_ 1.20.1.1.6 Lymphoid enhancer-binding factor, LEFl [mous 4e-15 

dlhrya_ 1.20.1.1.4 SRY [Human (Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

ATP_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

AMIDATION 1 

CAMP_PHOSPHO_SITE 2 

CK2_PHOSPHO SITE 14 

PKC PHOSPHORS ITE 10 

ASH~GLYCOS YLAT I ON 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 



SEQ MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 

SEG 

COILS 

lnhm- 

SEQ TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 

SEG 

COILS • 

lnhm- 

SEQ GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 

SEG 

COILS 

lnhm- 

SEQ KGTPESLAEKERQLSTMITQLISLREQLLAAHDEQKKLAASQIEKQRQQMDLARQQQEQI 

SEG xxxxxxxxxxxxxxx 

COILS CCCCCC 

lnhm- 

SEQ ARQQQQLLQQQHKINLLQQQIQVQGHMPPLMIPIFPHDQRTLAAAAAAQQGFLFPPGITY 

SEG xxxxxxxxxxxxxxxxxxxxxx XXXXXX 

COILS CCCCCCCCCCCCCCCCCCCCCC 

lnhm- 

SEQ KPGDNY PVQ F I PSTMAAAAASG LS PLQLQQL YAAQLASMQVS PGAKMPSTPQ PPNTAGTV 

SEG xxxxxxxxxxxx 
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COILS 

lnhm- 

SEO SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

lnhm- * 

SEQ PNKSSI PSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . . xxxxxxxxxxxxxxxxxx 

COILS 

lnhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG . .xxxxxxxxxxxx 

COILS 

lnhm- 

SEQ PGVIDLTRPEDAEGSKAMNGSAAKLQQYYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

COILS 

lnhm- CCC 

SEQ MNAFMVWAKDERRKILQAFPDMHNSNISKILGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

COILS 

lnhm- CCCHHHHHHHHHHHHHHHTTTTCCHHHHHHHHHHHTTTTTTHHHHHHHHHHHHHHHHHHH 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQIPITTGTGWYP 

SEG xxxxxxxxxxxx. 

COILS 

lnhm- HHHTTTTTTT 

SEQ GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQSTYGMKTDGGSLAGNEMINGEDEMEMY 

SEG xxxxxxx 

COILS 

lnhm- • 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

lnhm- 
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PS00001 


97- 


>101 


ASN GLYCOSYLATIOH 


PDOC00001 


PSO0001 


172- 


>176 


ASM GLYCOSYLATION 


PDOC00001 


PS00001 


388- 


>392 


AS N~GL YCOS Y LAT I ON 


PDOC00001 


PSO0001 


422- 


>426 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


559- 


>563 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


626- 


>630 


ASN~ GLYCOSYLATION 


PDOC00001 


PS00004 


126- 


>130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


369- 


>373 


CAMP~PHOSPHO SITE 


PDOC00004 


PS00005 




5->8 


PKC PHOSPHO SITE 


PDOC00005 


PS000O5 


28 


i->31 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


94 


->97 


PKC PHOSPHO~SITE 


PDOC00005 


PSO0OO5 


136- 


>139 


PKC PHOSPHO SITE 


POOC00005 


PSO0OO5 


203- 


>206 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


299- 


>302 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OO5 


390- 


>393 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


512- 


>515 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


530- 


>533 


PKC PHOSPHO SITE 


PDOC00005 


PSO00O5 


692- 


>695 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OO6 


28 


->32 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00OO6 


129- 


>133 


CK2~PHOSPHO SITE 


PDOC00006 


PS00OO6 


146- 


>150 


CK2~PHOSPHO SITE 


PDOC00006 


PSO00O6 


148- 


>152 


CK2 PHOSPHO~SITE 


PDOC00006 


PS0O006 


154- 


>158 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


186- 


>190 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


203- 


>207 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


221- 


>22S 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


520- 


>524 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


533- 


>537 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


547- 


>551 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


577- 


>581 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


639- 


>643 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


793- 


>797 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


182- 


■>188 


MYRISTYL 


PDOC00008 


PS00O08 


431- 


>437 


MYRISTYL 


PDOC00008 



J 
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PSOO00S 
PSOOOOB 
PS00008 
PSOO008 
PS00009 
PS00017 
PS00029 



437->443 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDATION 

AT P_GTP_A 

LEUCINE ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PDOC00029 



Pfam for DKFZphtes3_17nl2 . 1 



KMM_NAME 

HMM 

Query 

HMM 

Query 



HMG {high mobility group) box 

♦PKRPMNAYMLWMQEMReklKaENPNdMhNtEISKMiGEMWKnMsEEEKin 
+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK+ 
597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSKISKILGSRWKSMSNQEKQ 644 

PYEdMAeeEKqRYMKEMPeYK* 
PY+++ +++ + +++ +P+YK 
64 5 PYYEEQARLSKIHLEKYPNYK 665 
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DKFZphtes3_17nl8 



group: intracellular transport and trafficking 

DKFZphtes3_17nl8 encodes a novel 732 amino acid protein with weak partial similarity to known 
proteins . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 
receptor protein signature 1. In E. coli, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell. 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 



unknown receptor 

protein containes T0KB_DEPENDENT_REC_1 Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length: 2853 bp 

Poly A stretch at pos. 2806, no polyadenylation signal found 



1 GTCCTTTTAA GTCAGTAAAT TGAACTAAGT CGGTTATTCG GCAAGCAGTT 
51 CCTATAAAAA ACTACATGGC TAAGGTTCTT AATGATTGAC CACAAGCAGA 
101 TCTTTCACCC TCGGATCTCT AGCTACAAAA GGTCCCCACA CTGAAGAAGC 
151 CACTACCTCC ACCACCACCA GCACCACCAC GTCCAGTGCT GCTGGCAACC 
201 ACTGGGGCAG CCAAGCGCTC CACCCTCTCT CCCACCATGG CCCGTCAGGT 
251 GCGCACCCAC CAGGAGACCC TGAACAGGTT TCAGCAGCAG TCCATCCACC 
301 TGCTGACGGA GCTCCTCAGA CTGAAGATGA AGGCCATGGT GGAGTCTATG 
351 TCGGTGGGTG CCAACCCCTT GGACATCACC AGGCGCTTTG TGGAGGCCAG 
401 CCAGCTCCTC CACCTCAATG CCAAGGAGAT GGCCTTCAAC TGCCTGATCA 
451 GCACAGCCGG GAGAAGTGGC TACAGCAGCG GACAGTTGTG GAAAGAGTCC 
501 CTCGCAAACA TGTCCGCCAT TGGGGTGAAC TCGCCTTACC AGCTGATCTA 
551 CCACTCTTCC ACAGCCTGTC TGAGCTTTTC TCTCTCTGCT GGAAAAGAAG 
601 CCAAGAAGAA AATAGGCAAA TCTAGAACTA CAGAAGATGT CAGCATGCCG 
651 CCCCTGCATC GAGGAGTGGG AACCCCTGCC AACAGCCTGG AGTTCAGCGA 
701 CCCCTGCCCT GAGGCCCGGG AGAAGCTGCA GGAGTTGTGT CGCCACATAG 
751 AAGCTGAAAG GGCCACATGG AAAGGGAGGA ATATCTCCTA CCCCATGATC 
801 TTACGAAACT ACAAGGCAAA GATGCCCTCT CATCTAATGT TGGCCCGCAA 
851 AGGAGACTCT CAGACCCCGG GTTTACATTA CCCTCCCACT GCAGGTGCTC 
901 AGACTCTCAG CCCCACCTCT CACCCATCTT CTGCCAACCA TCATTTCAGT 
951 CAGCATTGTC AAGAGGGGAA GGCACCCAAG AAGGCCTTCA AGTTTCATTA 
1001 CACCTTCTAT GATGGCTCCT CCTTCGTTTA CTATCCCTCT GGAAACGTCG 
1051 CTGTATGTCA GATCCCCACA TGCTGCAGAG GGAGAACCAT CACCTGCCTC 
1101 TTTAATGACA TACCTGGATT CTCCTTGCTG GCCCTATTCA ATACTGAAGG 
1151 CCAGGGCTGT GTTCACTACA ACCTAAAAAC CAGTTGCCCA TATGTCTTAA 
1201 TCTTGGATGA GGAAGGTGGG ACCACCAATG ACCAGCAGGG CTATGTAGTC 
1251 CACAAGTGGA GCTGGACTTC CAGGACAGAG ACCCTGCTTT CCCTGGAATA 
1301 CAAGGTGAAT GAGGAAATGA AACTAAAGGT ACTGGGACAG GACTCCATCA 
1351 CAGTCACCTT CACCTCCCTG AATGAGACAG TAACACTCAC TGTGTCGGCC 
1401 AACAATTGTC CCCATGGAAT GGCATATGAC AAACGGCTGA ACCGCAGAAT 
1451 CAGCAACATG GACGACAAGG TGTATAAGAT GAGCCGAGCC CTGGCTGAGA 
1501 TCAAGAAGCG GTTTCAGAAG ACAGTGACTC AGTTCATTAA TTCTATCTTG 
1551 CTGGCCGCAG GTCTGTTTAC CATTGAATAT CCCACCAAAA AGGAGGAGGA 
1601 AGAATTTGTT CGGTTCAAGA TGAGATCCAG AACTCATCCC GAGCGGCTCC 
1651 CCAACCTAAG TTTATACTCA GGAGAAAGTC TTTTACGATC TCAGTCAGGC 
1701 CACCTGGAAT CCTCAATTGC AGAGACTTTG AAGGATGAGC CTGAGTCTGC 
1751 TCCTGTGAGC CCAGTTCGGA AGACCACCAA AATCCACACC AAAGCCAAGG 
1801 TCACATCCAG AGGGAAGGCC CGCGAGGGGC GCAGCCCCAC CAGGTGGGCG 
1851 GCCTTGCCCT CAGACTGCCC GCTGGTGCTG CGGAAGCTCA TGCTCAAGGA 
1901 AGACACCCGT GCTGGCTGCA AGTGCCTGGT GAAGGCGCCC CTGGTCTCTG 
1951 ACGTGGAGCT GGAGCGCTTC CTGTTGGCGC CCCGAGACCC CAGCCAAGTG 
2001 CTGGTGTTTG GGATCATCTC AAGCCAGAAC TACACCAGCA CTGGGCAGCT 
2051 CCAGTGGCTG CTGAACACTC TCTACAACCA CCAGCAGCGG GGCCGTGGCT 
2101 CCCCCTGCAT CCAGTGCCGG TATGACTCCT ACCGCCTGCT GCAGTATGAC 
2151 CTGGACAGCC CCCTGCAGGA GGACCCTCCC CTGATGGTGA AGAAGAACTC 
2201 TGTGGTGCAG GGGATGATTC TGATGTTTGC CGGGGGGAAG CTCATTTTTG 
2251 GGGGCCGTGT TTTGAATGGA TATGGCCTCA GCAAGCAGAA TCTGCTGAAA 
2301 CAGATCTTCC GGTCTCAACA GGATTACAAG ATGGGCTACT TCCTGCCGGA 
2351 TGACTACAAA TTCAGTGTTC CCAACTCTGT CCTGAGCCTG GAGGATTCTG 
2401 AATCAGTCAA GAAAGCCGAG TCAGAAGATA TCCAAGGAAG CAGCTCCTCA 
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24 51 TTGGCCCTGG AAGACTATGT GGAtjAAGGAG TTATCTCTGG AGGCTGAGAA 

2501 GACAAGAGAG CCTGAACTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 

2551 TAACTAGTTG GAAGAAGCAG CCCTdCAAGA AGTAGCGCCA TCCTGGCAGC 

2601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 

2651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGr CCTCCAAGCT TCTATAATAA 

27 01 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 

2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 

2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 

2851 CCG 



ORF from 237 bp to 2582 bp; peptide length: 782 

Category: putative protein 

Proaite motifs: AT P_GTP_A (122-130) 

TONB_DE P EN DEN T_REC_T (1-44) 



1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 

51 FVEASQLLHL NAKEMAFNCL ISTAGRSCYS SGQLWKESLA NMSAIGVNSP 

101 YQLIYHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 

151 LEFSDPCPEA REKLQELCRH IEAERATWKG RNISYPMILR NYKAKMPSHL 

201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 

251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 

301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT NDQQGYVVHK WSWTSRTETL 

351 LSLEYKVNEE MKLKVLGQDS ITVTFTSLNE TVTLTVSANN CPHGMAYDKR 

401 LNRRISNMDD KVYKMSRALA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 

451 KKEEEEFVRF KMRSRTHPER LPKLSLYSGE SLLRSQSGHL ESSIAETLKD 

501 EPESAPVSPV RKTTKIHTKA KVTSRGKARE GRS PTRWAAL PSDCPLVLRK 

551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGIISSQNYT 

601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 

651 VKKNSWQGM I LMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 

701 YFLPDDYKFS VPNSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 

751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



BLAST P hits 



No BLAST P hits available 



Alert BLASTP hits for DKFZphtes3_17nl8, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_17nl8, frame 3 



Report for DKFZphtes3_17nl8 . 3 



[BLOCKS] 
[PROSITEJ 
(PROSITE] 
[PROSITE] 
I PROSITE) 
1 PROSITE) 
[PROSITE] 
[PROSITE] 
[ PROSITE) 
[KW] 



[LENGTH] 
[MW1 



[pIJ 



782 

88030.16 
9.22 

BL00286 Squash family of serine protease inhibitors proteins 

AT P_GT P_A 1 

MYRISTYL 4 

CAMP_PHOSPHO SITE 3 

CK2 PHOSPHO_SITE 14 

PROKAR_LIPOPROTEIN 1 

TONB_DEPENDENT REC 1 1 

PKC_PHOSPHO SITE ~ 10 

AS N_GL YCOS YLAT I ON 4 

Alpha_Beta 
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SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLIYHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPAHSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RHISYPMILRNYKAKMP S HLMLARKGDS QT PGLHYP PTAGAQTLS PTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQIPTCCRGRTITCLFNDIPGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FHTEGQGCVHYHLKTSCPYVLILDEEGGTTNDQQGYVVHKWSWTSRTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANHCPHGMAYDKRLNRRISNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeecccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRSPTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKCLVKAPLVSDVELERFLLAPRDPSQVLVFGIISSQNYT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYHHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKHSWQGM 

PRD ccchhhhhhhhhhhhhcccccccceececccccceeecccccccccccccccccchhhhh 

SEQ I LMFAGGKLI FGGRVLNGYGLSKQNLLKQI FRSQQDYKMGYFLPDDYKFSVPNSVLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD cc 
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PS00001 


91 


->95 


ASH GLYCOSYLATION 


PDOC00001 


PS00001 


182- 


>186 


ASH GLYCOSYLATION 


PDOC00001 


PS00001 


379- 


>383 


ASN~G LYCOS YLAT ION 


PDOC00001 


PS00001 


598- 


>602 


ASH GLYCOSYLATIOH 


PDOC00001 


PS00004 


403- 


>407 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


511- 


>515 


CAMP PHOSPHO SITE 


PDOC00O04 


PS00004 


652- 


>656 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


46 


l->51 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


177- 


>180 


PKC~*PH0SPHO SITE 


PDOC00005 


PS00005 


344- 


>347 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


450- 


>453 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


497- 


>50O 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


513- 


>516 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


523- 


>526 


PKC PHOSPHO SITE 


.PDOC00005 


PS000Q5 


631- 


>634 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


723- 


>726 


PKC'PHOSPHO SITE 


PDOC00005 


PS00005 


774- 


>777 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


7 


->11 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


131- 


>135 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


256- 


>260 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOO06 


329- 


>333 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


345- 


>349 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2 PHOSPHO" SITE 


PDOC00006 


PS00006 


406- 


>410 


CK2~"PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


466- 


>470 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


571- 


>575 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


693- 


>697 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


717- 


>721 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


145- 


>151 


MYRISTYL 


PDOC00008 


PS00008 


327- 


>333 


MYRISTYL 


PDOC00008 


PS00008 


592- 


>59B 


MYRISTYL 


PDOC00008 


PS00008 


734- 


>740 


MYRISTYL 


PDOC00008 
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PS00013 101->112 PROKAR_LIPOPROTEIN PDOC00013 

PS00017 122->130 ATP GTP_A PDOC00017 

PS00430 l->4 4 TONB_DEPENDENT REC_1 PDOC00354 



(No Pfam data available for DKFZphtes3_l"7nl8 . 3) 



631 



WO 01/12659 



PCT/IBOO/01496 



DKFZphtes3_18f3 



group: testes derived 

DKFZphtes3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1. 

The novel protein contains two leucine zippers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to TNF-inducible protein CG12-1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4608 bp 

Poly A stretch at pos . 4570, polyadenylation signal at pos. 4550 

1 GACAGAAGTG AATGGGAATG GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 
51 CCCGACGCGC TGCGGCGCTT CCAGGGACTG CTGCTGGACC GCCGAGGCCG 

101 GCTGCACCGC CAGGTGCTGC GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 

151 GCCTGCGCAG GCGCTCCCTC GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 

201 GCAACGGGCG CCCTCGCCGC CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 

251 CCTGGGGACC TCGCTGCTGG TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 

301 CCGGAGGGGC CGTCACCATC ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 

351 TCCCGGGAGC TGCGGAGGGT GCAGGAGATC GCGGCCACCT GCCAGGACCA 

401 GATGCGAGAG ATCCTGAGCT GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 

451 GCGGGGACCG CCAGCTGCTG CAGTGCGGGA GGAACGCCTC CATCGCCCTG 

501 TACAATTCTG TCTACTTCAT CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 

551 CCCCAGGCGG GCGGAGGGGG ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 

601 CCAAGATTCA GAAACTGGCC GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 

651 GACGAACTCA GCGAGCAGCT GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 

701 CAGTCGTGGC CACGACCTCA AGATCTCTGC TGACCAGCGT GCAGGGCTGT 

751 TTTTCTGAGA ACATCCTTTC CCCCTAATGA CCGAGGCCAG CAAATCATCC 

801 TCATGGGATG CTCCAGAATT TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 

851 GTTAGGAGCC GAAGGCAAAG GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 

901 GTCCCCAAAG CCCTTCTTTT CCCATCACTG TGACATCTGC CTGGGCTTGA 

951 GTGCTACGGA CTTTTCAGTC TTCCTAGTGG AAAAATGTGA CCCAAAAACT 
1001 CCTTTTCCTT TATCAAAAAC TTTCTGTCTA AACACAGCTG GGCAGGCACT 
1051 CCTGTTTTAA AGTTATTTCG GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 
1101 CTGAGACTGG AGAGAGTGCC ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 
1151 TCTTTGAAGT CCTCAAAATG TGCGTGAGGA AGGCATTTGC CTCTATTCCA 
1201 GAATTTCTGA TACAAAGAAC TCCAGAATCC AGAGCAAATC AGCCCTTCTC 
1251 TGAACGTTGT AGGATGGTTC AGAACCCAGA GAGGACCCTG GTGCTGATAT 
1301 CTCCTCCTCT TCCCTTTCCC CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 
1351 GGTATGAAGT AGGCCTTTCC TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 
1401 CTCAGAGGGG AAGCCCGTGA AGCCGTCATC TAAGTCCTGC TCCCTCACAT 
1451 GAAGCTGAGG GCCAGATAGA TGGAGCGACT GCCAACTTCA TTTCCCGACA 
1501 TCATTGTGTT CAGAAGAGAG TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 
1551 TTGAGACAGG CTTTGTCACT ACTGTGTGAG TGTAGCCACC TAATCTCTCT 
1601 GAGACTGTGT AAAACAAAGA TGATAAAATC TCACCCTGTT GTGAGATATT 
1651 AAATGAGCCA AAGTGCCTAG CATGATGGTG CTGGCTCATA TAGTGTAGTC 
1701 CCTGGAATGG CAAATTAACA TCACCCAGGA ACTTGTTAGA AAGGCAAATT 
1751 CTTGGACACA ACCCTCCTGA TTTATGGAAT CAGAAACTCT GGCTGTGGGG 
1801 CCCAGCAACC TGAGTTTAAA CAATTTCTCT GGGTGGTTCT GCGGCACACT 
1851 AAGGTTTGAA AATCACTACA ACAAATGCTA ACTTCTAATC CCCTTGATGA 
1901 GCTTTCACGA AGTCTCACGG CTTCTCTAGG GACTCCATGG TCTTCAGAGT 
1951 CGTTCACAGA TGACCAAGGA CAGACTGTGT CCCAGAAGCC AAAATGAGAG 
2001 AGAGAGAGAG AGCACGCGTA CGTGCACCCT GGGGCAGTGT CTCACCGTAT 
2051 GAATAAGGGA TGTAACACTA AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 
2101 TGTTGTAGAA ACTGGTACAG AAAGGATCCT ATATGAAGTT CCTGAAACTG 
2151 ACCTTTGTCT ATTATTACCT TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 
2201 TATTTATTTT AAGTTTGTAA TTTAATTTTT AATTATTGTT TAGTGTTTGC 
2251 ATTTAATTTT ATTTAATCAC CACATTTAGA AAATAATAAG AGCAAGTTTC 
2301 TAAATGGGAG ACTGCTGAGG CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 
2351 TTTCTAAGGC AGGGCATGAG CTGGAAATAG CATTGCTTTC CTTGATTGTC 
2401 TCTCTCCTTC AGGGAGATTC TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 
2451 AAGTAAGTGT GGAGAGTCTT GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 
2501 TTGTAATGGA AGCTTGCATT GTGGGATATA TAACTGAGGA AGCATATTTA 
2551 TCCTGAAGGT ATTTTGCCAG AAGGTATCAC TTGACCTGGA AAAGGAATCT 
2601 ATTTAGTTCA GGAAAGATAA AAAGTTTAGA GGTATGTGAA GGAAGCACTT 
2651 AGAACTTGCA AGCCTGATGT CCTATCAAGT TATGTCTTCT GGGTGACAGA 
2701 CAAAATAGCT TGTCTTATGG TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2751 CTGTAAGAAA CTGTCAGTGA AAATATGTAC AATTCCTTCA ATTTCCATTC 

2801 TTAACAACTG TAATGTTGAA AAATAAGTTG AAAAGTCTTT GGGACCATAC 

2851 ATGCAAAAAC GGTGCCTCTG TTACTTAATT ATTTAATATT CTATAAATGT 

2901 ACCCAATCTG TCCGCACCCT TCCCAGTGAT GGGGCAGTAT GTCTGAGGAA 

2951 GTATAATTTC AGTACTGGGG TCGGGGAGAG GAGGTGATGT TTCTACATTT 

3001 TTATTTTTTC TATAAATTGC AATTGGTCTG TATGCTGGTT TATTTTGAAA 

3051 TTTATATTGG TTTCTTTTCA AGCTGGTGTC ATCTCCTAGA CTGTTTCACC 

3101 CAGATGCTAG CATTTTTTTT TTTTTTGAGA CAGAGTCTCA CTCTGTCACC 

3151 TAGGCTGGAG TTGCAGTGGT TTGATCTCGG CTCACTGCAA CCTCCGACTC 

3201 CTGGGTTCAA GCAATTCTTC TGCCTCAGCC TCCTGAGTAG CTGGGATTAC 

3251 AGATGTGCAC CAGCACACCC GGCTAATTTT TTGTATTTTT AGTAGAGACA 

3301 GGGTTTCGCC ATGTTGGCCA GGCTGGTCTT GAACTCCTGG CCTTATGTGA 

3351 TCCGCCCACC TTGGCTTCCC AAAGTGCTGG GATTACAGGC ATGAGCCACC 

3401 TCGCCTGGCC AGATGCTAGC ATTTTAGATC AAACAATTCA TTTTAGATGA 

3451 ATTGTTTTGT TTCACAATCA TTTTAAATCA TTTTAGAATG TACTTCACAT 

3501 TATTAGTTGT GTTATGGCAT AAAGGTACAA CCATTCCCTA ACTCCATCTT 

3551 TTATTAATGC TTAAGTTTAA ATTATATTCT TCCAATGCCT AAGCTATTCC 

3601 CTAGAATTAA ACTGGGCACT TTTGGAAGCA GCAACAGTAA CAGCAGCAGC 

3651 AAACTTTTCC TCTCATATTT TGGGTGTATC AAAAGTTCTA GACTTTTGAA 

3701 GTTATGATTT CAGTGGCCCA CTTTATTTCT AAGGAAGAGT GTCTACTTTG 

3751 GAACGATACT TTGCACATAG TAGGAACTCA AGAAATACAT TTGAATAATT 

3801 ATAATTAACT GTTTAGCTAT CTTAATGAGA ATTTGTTGAC AACAAAAGAT 

3851 CATCCATCGC CTTATGTGTG AGTAAGATTG GAGCCTCTAT CAAGATTTAG 

3901 TCAAGTTCAG TTAGATTGAT TCTAGAAACA AATATTTATT TCTTTCTTTT 

3951 ACGGGGATGT GAATAAGGCT TTTCCTTAAG GCCTTCATTC TTTAAACAAA 

4001 CAGGTTGAAA TGGTATGTTG TAAAAGAGAA GACGGGAGAG AGGTATTTAG 

4051 ATGATAAGTG TACTTCACAA AAATGCCAAA GTTTGAAAAA TAGGTATGTT 

4101 TGTTCTAAAT GTTTAAGTGC TTCTCTGTTA GGTTCTGGGG CTTGCAATCA 

4151 TTTGAATTGT TCTGTTTCAC AATAAAGGAG ATTCACTGGG TTCTGCATTT 

4201 TCAGGATTCA ATAGAACTGC TCCATTAAAA AAATAATCCT TAGCAAGCAT 

4251 TCGAATCCTA ACTGCTTTGA TGCACTTGCC CTCGGGCACC TGTCATTTCC 

4301 AATATGGTAG GTGTCAAAGT CAAAAGTATT TACTGGGAGA AAAAAGAGAG 

4351 GAGTGGTTGT AGAAGTCTCC CTAAATCAGA CATGTCAAGC AATCAGCCAA 

4401 CGTGGTGTAT TTCTCATTCA ATATTTTAGT GTGAATTGAG ACACTGAGAT 

4451 AAAGACATCG TGCAGAGATA AATGGGGATA CAGTTAAATG TAGCAACTCT 

4501 TGAGTTCATT TTTTCCCACT GTAGCAAAAT TAATGCTTTC TCTTTATTGA 

4551 AATAAATTGC TCATTCCTCC AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

4 601 AAAAAAGG 



BLAST Results 



Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score - 1951, P - 9.0e-101, identities - 411/425 

Entry HS073350 from database EMBL: 
human STS EST 3 03 5 64. 
Score - 1417, P - 8.7e-58, identities - 285/287 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 2 

PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments), N - 1, Score 
- 155, P - 4.5e-10 

TREMBL : HSCGlPAl 1 gene: "COL1A1"; Human proalpha 1 (I) chain of type I 
procollagen mRNA (partial)., N - 1, Score - 155, P - 6.5e-10 



633 



WO 01/12659 



PCT/IBOO/01496 



>PIR:CGB01S collagen alpha 1(1) chain 
Length - 779 



bovine (fragments) 



Score - 155 (23.3 bits). Expect - 4.5e-10, P - 4.5e-10 
Identities - 60/152 (39%), Positives - 67/152 (44%) 



7 GEAGG PGAAWARRAAALPGT AA- -G P PRPAAP PGA - - APARGG P APGA PAQA L PRS QRGR 62 
G+ G PG + AR PG GPP PA P GA AP G A A P SQ 

230 GDLGAPG P SGARGERG FPGERGVEG P PG PAG P RGANGAPGNDGAKGDAGAPGA PGSQGA P 2 8 9 

63 QLAERNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 
L G P RGA PG GD +GA G + G VR L + PG A 
290 GL QGM PGE - RGAAGL PGPKGDRGDAG P KGADGA PGK DG VRG LTG P I G P PGPAG 341 

123 GAGDRGHL-P-GP DARDPELPRVFLPLAGLRGPPAA 156 

GD+G P GP D +P P P AG GPP A 
342 A PGDKGEAG P SG PAGT RGAPGDRGE PG P PG P-AGFAGPPGA 381 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 121 (18.2 bits). Expect - 5.4e-05, P = 5.4e-05 
Identities - 52/154 (33%), Positives - 60/154 (38%) 



Query: 


7 


Sbjct: 


434 


Query: 


62 


Sbjct: 


492 


Query: 


122 


Sbjct: 


542 


Score 


- 117 


Identities < 


Query: 


7 


Sbjct: 


416 


Query: 


63 


Sbjct: 


473 


Query: 


121 


Sbjct: 


529 


Score 


- 117 


Identities « 


Query: 


7 


Sbjct: 


29 


Query: 


61 


Sbjct: 


89 


Query: 


116 


Sbjct: 


149 


Score 


- 113 


Identities 1 


Query: 


7 


Sbjct: 


374 


Query: 


58 


Sbjct: 


434 


Query: 


118 


Sbjct: 


487 


Score 


- 110 



G PGAA R P AGPP PPG ++G 



GL GPP + RE 



GPA G P + P 



R L PG + 
-RGFPGL PGPS 541 



xpect - l.Be-04, P - 1.8e-04 
Positives - 62/148 (41%) 



G PG AR +A PG A G P A 



GRP 



G+RG LPGP + P +G RGPP 

-GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 

.6 bits). Expect - l.Be-04, P - l.Be-04 
> 64/162 (39%) 



-PPGAAPARGGPAPGAPAQALPRSQRG-R 62 
PPG + GP PG P A +G R 



54/162 (33*), Positives 



G G A PG P + 



GAA G AG+RG +PGP 



■ 58/148 (39%) 



G AG PGA 



AG++G PG D A 
G P AGEKG- A PGAOG PA 

110 (16.5 bits) , Expect 



-AGPPRPAAP- 
AGPP PA P 



-PGAAPARGGPAP-GAPAQALPR 57 
PG G P P GA A P 



PG PG 



1.3e-03, P - 1.2e-03 



634 



WO 01/12659 



PCT/IB00/01496 



Identities » 54/151 (35%), Positives - 60/151 (39%) 



Query: 


7 


GEAGGPGAAWARRAAALPGTAAGPPRPAAPPG--AAPAR-GGPAP-GAPAQALPRSQRGR 


62 




GE G G A + LPG A GPP A PG P G P P GA + +RG 




Sbjct: 


194 


GERGEQGPAGSPGFQGLPGPA-GPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGV 


252 


Query: 


63 


QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 


122 






+ PR GA G GD A G+ G +G R A L PG 




Sbjct: 


253 


EGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGE-RGAAGL PGPK- 


307 


Query: 


123 


GAGDRGH L PG P DAR0 — PELPRV FLP LAG LRG P P AAA 157 






GDRG GP DP V L G GPP A 




Sbjct: 


306 


--GDRGDA-GPKGADGAPGKDGV-RGLTGPIGPPGPA 340 




Score 


- 109 


(16.4 bits). Expect - 1.7e-03, P - 1.7e-03 




Identities - 55/154 <3S%>, Positives - 60/154 (38%) 




Query: 


4 


NGN-GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG-GPAPGAPAQALPRSQRG 


61 




NG+ GEAG PG R P AGP A PG RG GA A P +G 




Sbjct: 


67 


NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKG 


125 


Query: 


62 


RQLAE-RNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGHSRRGRHHHVRSL ADLL 


115 




+ NGP+G PG PG A GG G V A 




Sbjct: 


126 


EPGS PG ENG APGQ- MG PRGL PG F PG PKGAAGE PG KAG ERG V PG P PGAVG PAGK DGEAGAQ 


184 


Query: 


116 


QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 








PG A AG+RG GP A P F L G GPP A 




Sbjct: 


185 


GPPGPAGPAGERGE-QGP-AGSPG FQGL PG PAG PPGEA 220 




Score 


« 104 


(15.6 bits). Expect - 6.6e-03, P =• 6.6e-03 




Identities ■ 


- 44/131 (33%), Positives « 49/131 (37%) 




Query: 


2 


EVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 


60 




E GE G PG R LPG GP A PG A RG P P GA A + 




Sbjct: 


126 


E PGS PG ENGA PGQMG PR G L PGFP -G PKGAAGE PGKAGE RGV PG P PGAVG PAGKDGEA 


181 


Query: 


61 


GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 


120 






G Q P RG G PG G+ G G G+ DL PG 




Sbjct: 


182 


GAQGPPGPAGPAGERGEQGPAGSPG--FQGLP-GPAGPPGEAGKPGEQGVPGDL-GAPGP 


237 


Query: 


121 


AEGAGDRGHLPG 132 






+ G+RG PG 




Sbjct: 


238 


SGARGERG- FPG 248 




Score 


- 104 


(15.6 bits), Expect - 6.6e-03, P - 6.6e-03 




Identities « 


■ 43/131 (32%), Positives - 55/131 (41%) 




Query: 


7 


G EAGGPGAAW ARRAAAL PGTAAGPP R P AA P PGAA PARGG PA PGA P AQAL PRSQRGRQLAE 


66 




GEAG GARA PG GPP PGA GP PGA Q + + G A+ 




Sbjct: 


347 


G EAGPSG PAGT RGA- - - PGDR-GEPG P PG PAG FA GP-PGADGQPGAKGEPGDAGAK 


397 


Query: 


67 


RMGRPRRH RGALAQPGH PG DLAAGVGRGAGGGH S RRGRH HH V RS LADLLQL PGAAEGAGD 126 




+ P G PG G+ + A +GA G G + A + PG + AG 




Sbjct: 


398 


GDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGA-AGRVGPPGPSGNAGP 


456 


Query: 


127 


RGHLPGPDARD 137 






G PGP ++ 




Sbjct: 


457 


PGP-PGPAGKE 4 66 




Score 


- 104 


(15.6 bits), Expect » 6,6e-03, P = 6.6e-03 




Identities - 


• 56/162 (34%), Positives - 62/162 (38%) 




Query: 


7 


GEAGGPGAAWARRAAALPGTAA—GPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQL 


64 




G G PGA A G GP P P G A ARG P P Q PR +G 




Sbjct: 


608 


G P PGAPGA PGPVG PAG KSG DRGETG P AG P I G PVGPAGARG PAG P - QG - PRGB KG ZTG 


662 


Query: 


65 


AERNGRPRRHRG ALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLA-DLLQ-LPG 


119 




+ + + HRG PG PG GA G RG SDL LPG 




Sbjct: 


663 


ZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGSPGKDGLNGLPG 


722 


Query: 


120 


AAEGAGDRGHL— PGP DARD PELPRV FLPLAGLRGPPAAAVREERLHRPVQ 168 






G RG GP A P P P G GPP+ L +P Q 




Sbjct: 


723 


PIGPPGPRGRTGDAGP-AGPPGPPG P-PGPPGPPSGGYDLSFLPQPPQ 768 





Score » 101 (15.2 bits), Expect - 1.5e-02, P » 1.5e-02 
Identities - 49/148 (33%), Positives - 55/148 (37%) 



Query: 7 G EAGGPGAAWARRAAAL PGTAAG P P RPAA P PGAA PA RGG P APGA PA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P G 

Sbjct: 152 GAAGEPG K AGERGV PG P PG- AVG P- - - AGK DGEAGAQG P PG PAG PAGE RGEQG PAGS PG F 207 



635 



WO 01/12659 



PCT/IB00/01496 



Query: 
Sbjct: 



Query: 
Sbjct: 



63 QIAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 
Q P G + G PGDL A GG RG R+ PGA 
208 QGL PGPAG P PGE AG K PG EQG V PG DLGA P G P SGA RG E RG F PGE - RG VEG P PG PAG 260 

123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

GGPGD + PG+GP 
261 PRGANG - A PGN DGAKG DAGAPGA P- - GS QGAP 289 



Score - 100 (15.0 bits), Expect - 1.9e-02, P - 1 . 9e-02 
Identities - 40/130 (30%), Positives » 48/130 (36*) 



7 GEAGGPGAAWARRAAAL PGT - - AAG P P R PAAP PGAA PARG - - G PA - - PGAPAQAL P RSQR 60 
G G PG + PG A+GP P PPG GGAPGP+P + 

29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

61 G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P ♦ HRG G GO +G G G + L 

8 9 GARGLPGT AGL PGMKGHRG FSGL DGAKG DAG PAGPKGEPGS PGENGAPGQMGPRG -LPGF 147 

118 PGAAEGAGDRG 128 

PG AG+ G 
148 PGPKGAAGEPG 158 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 99 (14.9 bits), Expect » 2.5e-02, P - 2.5e-02 
Identities - 53/156 (33%) , Positives - 61/156 (39*) 



Query: 



Sbjct: 
Query: 



Sbjct: 
Query: 



Sbjct: 



7 GEAGG PGAAWARRA AAL PGT - -AAG P P RPAAPPGAA PARG - -GPA PGAPAQAL 55 

G G PGA R A PG AGPPPG+RG GPA P PA A 

S87 G RDGSPGAKGDRG ETG PAGA PG P PGA PGA PGP VG PAGKS G DRGETGPAGP I G P VG PAGAR 64 6 

56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

647 GPAGPQGPRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGS PGEQGPSGASGPAGPRGP- 705 

109 RSLADLLQL PGAAEGAGDRG— HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

706 PGS AGS PGK DGLNGL PG P I G - - P PG PRGRTG DAG PAG P P 742 



Score - 98 (14.7 bits), Expect - 3.3e-02, P - 3.3e-02 
Identities - 51/158 (32%), Positives = 58/158 (36%) 

Query: 7 GEAGGPGAAWARRAAALPGTA AG P PRPAAPPGAAPARGG PAP- GAPAQALPRS QR 60 

G G G R AA LPG AGP PG RG P G P A + 

Sbjct: 287 GA PGLQGM PGERG AAGLPG P KGDRGDAG PKGADGAPGKDG VRGLTGP I G P PG PAGA PG DK 346 

Query: 61 GRQLAERNGRPRRHRGA LAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

Sbjct: 347 GE - - AG PSG - PAGTRGAPGDRG E PG P PGPAGFAGPPGADGQPGAKGE PG DAGAKGDAG P- 402 

Query: 118 PGAAEGAG DRGK L PG PDARD PE L P R V FL PLAGLRGP PAAAVR 159 

PGAAGG+ AP+R GGPAAR 
Sbjct: 403 PG P AG PAG P PG P I GNVGA PG PKGARGSAG P PGATGF PGAAGR 4 44 

Score = 96 (14.4 bits). Expect - 5.7e-02, P - 5.5e-02 
Identities » 46/152 (30%), Positives « 57/152 (37%) 

Query: 6 NGEAGG PGAAWARRAAALPGTAA — GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG AGPP PA++ R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRS LADLLQLPGAAE 122 

P RG G G+ +G G RG H R + L PG 

Sbjct: 634 AG P I G P VG P AGA RG PAG PQG PRGB KGZTGZ ZGBRG I KGH -RGFSGLQGP PG PPG 686 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 SPGEQG--PS-GASGP AGPRGPPGSA 709 

Score - 94 (14.1 bits). Expect - 9.7e-02, P » 9.2e-02 
Identities = 45/134 (33%), Positives - 56/134 (41%) 

24 PGT AAG P PRP AA P PGAA PARGG PA- PGAPAQALPRSQRGRQLAERNG RPRRHR- - GALAQ 80 

P GPP PG +G P PG P + P RG G P ++ G + 

21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 

81 PGHPGDLAA-GV— GRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGDRGH— LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

76 PGRPG E RGPPGPQGARGLPGTAGLPGMKGH-RG FSGL DGAKG DAGPAGPKGEPG SPG EN G 134 

136 RDPEL-PRVFLPLAGLRGPPAAA 157 
++ PR LP G GP AA 



Query: 
Sbjct: 



Query: 
Sbjct: 



Query: 



636 



WO 01/12659 



PCT/IB00/01496 



Sbjct: 



135 A PGQMG PRG-L P — G FPG PKGAA 154 



Score = 92 (13.8 bits), Expect - 1.7e-01, P » 1.5e-01 
Identities » 52/155 (33%), Positives - 58/155 (37%) 

Query: 7 GEAGG PGAAWARRAAAL PGT AAGPPRPAAPPGAAPARGGP-APGAPAQALPRSQRGRQLA 65 

GEAG G A R A G GPP PA G AGPAGP A + G 
Sbjct: 347 GEAGPSGPAGTRGAPGDRGEP-GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGP 405 

Query: 66 ERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGR--HHHVRSLADLLQLPGAA-- 121 

P G + PG G + GA G GR A PG A 

Sbjct: 406 AG P AG P PG P I GNVGA PG P KGARGS AG P PGATG F PGAAG RVGP PG PSGN AG P PG P PGPAGK 465 

Query: 122 EGA-GDRGHLPGPDARDPELPRVFLP-LAGLRGPPAA 156 

EG+ G RG GP R E+ P AG +G P A 

Sbjct: 466 EGS KG P RGET-G P AGRPGE VG P PG P PG P AGEKGA PGA 501 

Score = 92 (13,8 bits), Expect = 1.7e-01, P =- 1.5e-01 
Identities ■ 51/156 (32%), Positives » 57/156 (36%) 

Query: 7 GEAGGPGAAWARRA AALPGT--AAGPPRPAAPPGAAPARGGPAPGAPAQAL-PRSQR 60 

G G PGA R A PG AGPPPG+RG PP +P R 

587 GROGS PG AKG D RGETG P AGAPG P PGA PGA PG P VG PAG KS G DRGETG P AG P I GPVG PAGA R 64 6 

61 GRQLAERNGRPRRHRGALAQPGHPGDLA-AGVG--RGAGGGHSRRGRH--HHVRSLADLL 115 
GAGPR+G+GG G+GG G A 

647 GP--AGPQG-PRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR 703 

116 QLPGAAEGAGDRG— HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 
704 GP PGSAG S PGK DG LNG L PG P I G- - P PG PRG RTGDAG P AG P P 742 



Sbjct: 
Query: 



Query: 
Sbjct: 



Score = 90 (13.5 bits), Expect - 2.8e-01, P » 2.5e-01 
Identities - 45/134 (33%), Positives - 53/134 (39%) 



Query: 



Query: 
Sbjct: 
Query: 
Sbjct: 



7 GEAGG PGAAWARRAAAL PGT AAG P PRPAA P PG AA PARGG PAPGA PAQAL PRSQRG RQ- LA 65 
G G PG A + A GA P P PGA RG GPQ R +RG L 
485 GP PG P PGPAGEKGAPGADG P AGAPGT PG - PQG I AGQRG- - VVGL PGQ RGERGFPGLP 538 

66 ERNGRPRRH--RGALAQPGHPGDLA AGV GR-GAGGGHS RRG RH H H VRS LADL 114 

+G P + GA + G PG + AG GR GA G GR + D 

539 GPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRDGSPGAKGOR 598 

115 LQL-PGAAEGAGDRGHLPGP 133 

+ PAG PGP 
599 G ETG PAGA PG P PGAPGAPG P 618 



Score - 83 (12.5 bits), Expect - 1.8e+00, P - 8.3e-01 
Identities - 49/156 (31%), Positives - 56/156 (35%) 



Query: 7 GEAGG PGAAWARRAAAL PGT AA--GP PRPAA PPGAAPARG--G PAP -- GAP AQALPRSQR 60 

G+AG GA A + G GPP PA PG G GPA GAP R + 

Sbjct: 311 GDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDKGEAGPSGPAGTRGAPGD RGEP 367 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 120 

G P G G PGD A G G G + ++ PG 
Sbjct: 368 G PPG PAG FAG P PGADGQPGAKGE PG DAGAKG DAG PPG PAG PAG PPG P I GN VG APGP 423 

Query: 121 AEGAGDRGHLPGPDARDPELPRVFLP LAGLRGPPAAAVRE 160 

G G PG RV P AG GPP A +E 

Sbjct: 424 KGARGSAGP-PGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKE 466 

Score - 82 (12.3 bits), Expect - 2.3e+00, P - 9.0e-01 
Identities - 46/148 (31%), Positives - 52/148 (35%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



7 GEAGG PGAAWARRAAAL PGTAAGPPRPAAP PGAA PARGG PAPGAPAQAL PRSQRG RQLAE 66 
G+AG PGA ++ALGG A PG RGPAP RL 
275 GDAGAPGAPGSQGAPGLQGMP-GERGAAGLPGPKGDRGDAGPKG-ADGAPGKDGVRGLTG 332 

67 RNGRP RRH RGALAQ PGH PG DLAAGVGRGAGGGHS RRGRHHH VRS LA0LLQL PGAAEGAGD 126 
G P G PG G+ G G RG A PGA G 

333 PIGPP G PAGA PGDKGEAG PSG P AGTRGAPGDRG E PG P PGP- AG FAG P PGA DGQ PGA 387 

127 RGHLPGP-DARDPELPRVFLPLAGLRGPP 154 

+G PG A+ P P AG GPP 
388 KGE-PGDAGAKGDAGPPG--P-AGPAGPP 412 



Peptide information for frame 3 



637 



WO 01/12659 



PCT/IBOO/01496 



ORF from 12 bp co 755 bp; peptide length: 248 
Category: similarity to known protein 
Classification: unset 

Prosice motifs: LEUCINE_ZIPPER (17-39) 
LEUCINE ZIPPER (24-46) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3 , frame 3 

TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N - 1, Score 
= 135, P - le-06 

TREMBL:HS6802_1 gene: "dJ6802.1"; product: "dJ6802.1"; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22. Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N - 1, Score = 107, 
P - 0.0023 



>TREMBL:AF070675 1 product: "TNF-inducible protein CG12-1"; Homo sapiens 

TNF-inducible protein CG12-1 mRNA, complete cds. 

Length - 331 

HSPs: 



Score = 135 (20.3 bits). Expect - 1.0e-06, P = 1.0e-06 
Identities - 30/103 (29%), Positives » 55/103 (53%) 

Query: 30 RLH RQVLRLREV ARRL ERLRRRSLVANVAGS S LS ATGALAA I VG LS LS PVTLGT SL L VSA 89 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKLRALANGIEEVHRGCTISNWSSSTGAASGIMSLAGLVLAPFTAGTSLALTA 150 

Query: 90 VGLGVATAGGAVTITSDL-SLIFCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S E + AT D+++ 

Sbjct: 151 AGVGLGAASAVTGITTSI VEHSYTSSAEAE-ASRLTATSIDRLK 193 

Pedant information for DKFZphtes3_18f 3, frame 2 



Report for DKFZphtes3_18f 3.2 

( LENGTH ] 193 

(MW] 19708.24 

[pi] 11-90 

(KW] AllJUpha 

{KW] LOW_COMPLEXITY 55.44 % 

SEQ T EVNGNGEAGG PGAAWARRAAALPGTAAG P PRPAAP PGAA PARGG P APGA P AQALP RSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

S EQ AEGAGDRGH L PG P DARO PELP RVFL PLAGLRG P PAAAV REERLHRPVQ FC LLH RLLWLTW 

SEG xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccc 



(No Prosite data available for DKFZphtes3_18f 3. 2) 
(No Pfam data available for DKFZphtes3_18f 3 .2) 

Pedant information for DKFZphtes3_18f 3, frame 3 
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Report for DKFZphtes3_18f 3. 3 



( LENGTH] 248 

[MW| 27162.56 

[pll 9.92 

[PR0SITE1 LEUCINE ZIPPER 2 

( KW 1 TRANSMEMBRANE 1 

{KW] LOW_C0MPLEXITY 30.65 % 

IKW] COILED_COIL 12.10 % 



SEQ MGMERPAAREPHGPDALRRFQGLLLDRRGRLHRQVLRLREVARRLERLRRRSLVANVAGS 

SEG XXXXXXXXXXXXXXXXXX . XXXXXXXXXXXXXXXXXXXX . . XXX 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 

COILS 

MEM 

S EQ S LS ATGALAAI VGLSLS PVT LGTS LLVS AVGLGVAT AGGAVT ITS DLS L I FCNSRELRRV 

SEG XXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 

SEQ QEIAATCQDQMREILSCLEFFCRWQGf.GDROLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 

COILS 

MEM 

SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 



Prosite for DKFZphtes3_18f 3 . 3 

PS00029 17->39 LEUCINE_ZIPPER PDOC00029 

PS00029 24->46 LEUCINE_ZIPPER PDOC00029 



[No Pfam data available for DKFZphtes3_18f 3 . 3) 
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DKFZphtes3_1817 



group: cell structure and motility 

DKFZphtes3_1817 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins. ~ 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat. 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-based membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 



similarity to ankyrins 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 4501 bp 

Poly A stretch at pos. 4423, no polyadenylation signal found 



1 GATCGCCGCG CGAGGGTGGT GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
51 AGGTGCCGCC GTCGCCCAGG ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
101 TAAGAGATTT GCTCTGACCC ATCTGAAGTC CATATGGCTC TGTATGATGA 
151 AGACCTCCTG AAAAATCCTT TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
201 ACTTGTGCAG CAAAGTGGCC CAAATCCATG GCATTGTCTT AGTACCCTGC 
251 AAAGGAAGCC TGTCGAGCAG CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
301 CATTTTGATA CCTGTGGAAG AGCATTTTCA GACCTTAAAT GGAAAGGATG 
351 TCTTTATTCA AGGGAACAGG ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
401 CTCTCAGTGC CCATTCTCTT TGAAGAAACT TTCTACAATG AAAAAGAAGA 
451 GAGTTTCAGC ATCCTGTGTA TAGCCCATCC TTTGGAAAAG AGAGAGAGTT 
501 CAGAAGAGCC TTTGGCACCC TCAGATCCCT TTTCCCTGAA AACCATTGAA 
551 GATGTGAGAG AGTTCTTGGG AAGACACTCC GAGCGATTTG ACAGGAACAT 
601 CGCCTCTTTC CATCGAACAT TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
651 ACCACATAGA CTCAGCGAAT GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
701 CTGAGGGACT CTCACCTGAA AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
751 CCTGATGAAG CAGGCAGTGG AGATATACGT CCATCATGAA ATTTACAACC 
801 TCATCTTTAA ATACGTGGGG ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
851 AACAAAATCA CAAGAAGCCT TCAAGATCTT CAGCAGAAAG ATATTGGTGT 
901 GAAACCGGAG TTCAGCTTTA ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
951 AGCTGAACAA ATGCACCTCC CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
1001 GTGGTGCAGC TCATTACACA GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
1051 CATGTGTGCT GATGATCTGC TATCAGTCCT GTTATACTTG CTTGTGAAAA 
1101 CGGAGATCCC TAATTGGATG GCAAATTTGA GTTACATCAA AAACTTCAGG 
1151 TTTAGCAGCT TGGCAAAGGA TGAACTGGGA TACTGCCTGA CCTCATTCGA 
1201 AGCTGCCATT GAATATATTC GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
1251 AGTCTGAGGG ATTTGGAGAC AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
1301 CTCTCTCAGA TGACTTCGTC TCCCACCGAC TGCCTGTTTA AGCACATTGC 
1351 ATCAGGTAAC CAGAAAGAAG TGGAGAGACT TCTGAGCCAA GAGGACCATG 
14 01 ATAAAGATAC CGTCCAAAAG ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
1451 TGTGAGAAAC TCGTCTCTGG GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
1501 ATTCTCCAGA GACGACAGGG GGCACACCCC TCTCCATGTG GCTGCTGTCT 
1551 GTGGGCAGGC ATCCCTCATC GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
1601 AATGCCACAG ACTACCATGG GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
1651 GGGCTACCAG AGCGTGACGC TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
1701 AAGTGCAGGA CAACAATGGG AATACGCCAC TCCACCTGGC CTGCACCTAC 
1751 GGCCACGAGG ACTGTGTGAA GGCTCTGGTT TACTACGACG TGGAGTCGTG 
1801 CAGACTTGAC ATTGGCAATG AGAAAGGAGA CACCCCTCTA CACATTGCTG 
1851 CCCGCTGGGG CTACCAAGGC GTCATAGAGA CATTGCTGCA GAACGGAGCG 
1901 TCCACCGAGA TCCAGAACAG ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
1951 AAACTCAAAG ATTCTGTCTG TAATGGAAGC CTATCACCTG TCCTTCGAGA 
2001 GGAGGCAGAA GTCGTCCGAG GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
2051 GACTCCATCA GCCAAGAGTC CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
2101 CGGCTCAAGG CAGGAGGAGA CCAAGAAGGA CTACAGAGAG GTAGAAAAAC 
2151 TTTTGAGAGC AGTTGCTGAT GGAGATCTAG AAATGGTGCG TTACCTGTTG 
2201 GAATGGACAG AGGAGGACCT GGAGGATGCG GAGGACACTG TCAGTGCAGC 
2251 AGACCCCGAA TTCTGTCACC CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
2301 CTCAGAAGAG GCTGGCGAAG GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
2351 ACCAGCCAGG ACGGCTCCTC CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
2401 GGCGGACCTC ATCCGCCTCC TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
2451 GGAACGCAGA CCAAGCCGTC CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
2501 TTTCAGGTGG TGAAGTGTCT GTTAGATTCG AATGCAAAAC CCAATAAGAA 
2551 GGACCTCAGT GGAAACACGC CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
2601 ACGAGCTTGT GGCACTGCTG CTACAGCACG GGGCCTCCAT TAACGCTTCT 
2651 AACAATAAGG GCAACACAGC GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC TGCTCCACGG AGCGTCAGTT CAGGTGCTGA 
2751 ACAAGCGGCA GCGCACGGCT GTAGACTGTG CTGAACAGAA TTCAAAAATA 
2801 ATGGAATTGC TTCAGGTGGT ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
2851 GGCTGAAACT GACCGCAAGG AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
2901 GGAACTCAAA ACTGTATGAT CTACCAGATG AGCCTTTTAC AAGACAGTTT 
2951 TACTTTGTCC ACTCAGCTGG TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
3001 TATGGCAAGA GATAGAAGTG TCCCTAATTT AACCGAAGGT TCTTTGCATG 
3051 AGCCAGGGAG GCAAAGTGTC ACACTGAGAC AGAATAACCT GCCAGCTCAG 
3101 AGTGGATCTC ATGCTGCTGA GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
3151 TGGACTGACA CAGACTGGCC CTGGACACAG ACGGATGCTG CGGAGACACA 
3201 CGGTAGAGGA TGCGGTCGTG TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
3251 TCCACTCCCC AAGAGGTTAG TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
3301 TGTTGAACCC ACTGCTAGGA AGCAAGGATG CAACAAGATG ATGCTGAGCG 
3351 TGAACACATC TGAGAACTAA ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
3401 CTTCAGCACC AAGTTCCTGA AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
3451 AAAAAAGTTA ACCACCACCA TCTCTCTCCT CTTCAAAGCT AATGAATACA 
3501 ATTGAAACAG ACAAAAATTC CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
3551 GCATGCTTCT TTTTAAGTAT GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
3601 TCACCACCGC ATTCTGACCT CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
3651 ACCTGTGTAC ATTCACAAAC CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
3701 GCTGGAGAGA AGTAAGTAAT TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
3751 TGAAATGTCA TATCTGAAGG AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
3801 GCAAAGCAAC ACTCGAACCA AAAGATGCCT CAATCCCATT TTGATATTCA 
3851 TTTTAGTGAA AGGATGCATC AGACCTGTTC CACATCATGC ACATGGGAAA 
3901 GGGTGGTTAT CATTTTCCTT CTAACAAGTA GGTACAGATA TTCGGTTACT 
3951 ACACGTGCAC CTGTAGCAGT ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
4001 CCTCCCTTGA ATGTCTGTCA CACTCACACC TGACGGGATG GTTACTGGAT 
4051 TAGAGAGTAG ATTTGGCACA TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
4101 AACTTAACAG CACAAACCAG GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
4151 CCATTTATTC CTTTTTATAA ATTTCTATAG ATTATACTGT TATTTTTATG 
4201 TTATTGGCCT AGAGCTACAC GTATATGGGT TTGTCCTGAG TCCGTTTTCA 
4251 AATGACCTTG TGATAGGGAA ATGGTTTTGT CCATGTTCTT GGAAATACTT 
4301 GTGTATGTAC AGAAGGAAGG GAGGGATTAT TTTTCTACAA AGTAATTTAT 
4351 GATTTCTAAT TTTCTAATGT GCCTTGGATA TGTGCCAAAT GATGGAAAAG 
4 401 AAACAGTAAA CTTTATGATT CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 
4501 G 



BLAST Results 



No BLAST result 



"Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP A (945-953) 



1 MALYDEDLLK NPFYLALQKC 

51 CQFESYILIP VEEHFQTLNG 

101 YNEKEESFSI LCIAHPLEKR 

151 RFDRNIASFH RTFRECERKS 

201 OEAQMNLMKQ AVEIYVHHEI 

251 QKDIGVKPEF SFNIPRAKRE 

301 RVNLETMCAD DLLSVLLYLL 

351 CLTSFEAAIE YIRQGSLSAK 

401 LFKHIASGNQ KEVERLLSQE 

451 PSVVTPFSRD DRGHTPLHVA 

501 HLACQKGYQS VTLLLLHYKA 

551 YDVESCRLDI GNEKGDTPLH 

601 PLKCALNSKI LSVMEAYHLS 

651 FSSMSAGSRQ EETKKDYREV 

701 DTVSAADPEF CHPLCQCPKC 

751 AALHGRAOLI RLLLKHGANA 

801 AKPNKKDLSG NTPLIYACSG 

851 VIEKHVFVVE LLLLHGASVQ 



RPDLCSKVAQ IHG1VLVPCK GSLSSSIQST 
KDVFIQGNRI KLGAGFACLL SVPILFEETF 
ESSEEPLAPS DPFSLKTIED VREFLGRHSE 
LRHHIDSANA LYTKCLQQLL RDSHLKMLAK 
YNLIFKYVGT MEASEDAAFN KITRSLQDLQ 
LAQLNKCTSP QQKLVCLRKV VQLITQSPSQ 
VKTEIPNWMA NLSYIKNFRF SSLAKDELGY 
PPESEGFGDR LFLKQRMSLL SQMTSSPTDC 
DHDKDTVQKM CHPLCFCDDC EKLVSGRLND 
AVCGQASL1D LLVSKGAMVN ATDYHGATPL 
SAEVQDNNGN TPLHLACTYG HEDCVKALVY 
IAARWGYQGV IETLLQNGAS TEIQNRLKET 
FERRQKSSEA PVQSPQRSVD SISQESSTSS 
EKLLRAVADG DLEMVRYLLE WTEEDLEDAE 
APAQKRLAKV PASGLGVNVT SQDGSSPLHV 
GARNADQAVP LHLACQQGHF QVVKCLLDSN 
GHHELVALLL QHGASINASN NKGNTALHEA 
VLNKRQRTAV DCAEQNSKIM ELLQVVPSCV 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLAST P hits 

No B LAST P hits available 

Alert BLASTP hits for DKFZphtes3_1817, frame 2 

TREMBL :HSU4 3965 1 gene : "ANK3"; product: "ankyrin Gil 9"; Human ankyrin 
G119 (ANK3 > mRNA, complete cds . , N - 2, Score «* 287, P - 3.7e-21 

PIR: 149502 ankyrin - mouse, N - 3, Score - 365, P = 2.2e-27 

TREMBL : HSANKY 2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 

ankyrin (variant 2.1), N" - 2, Score = 380, P - 7.3e-31 

SWISSPR0T:ANK1 HUMAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) (ERYTHROCYTE 
ANKYRIN) . , N -~2, Score - 380, P - 8.2e-31 

PIR:SJHUK ankyrin 1, erythrocyte splice form 1 - human, N - 2, Score » 
380, P - 8.2e-31 



> TREMBL : HSANKY_2 product: "*alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length - 1,719 



HSPs: 

Score - 380 (57.0 bits), Expect - 7.3e-31, Sum P(2) « 7.3e-31 
Identities =» 139/447 (31%), Positives = 207/447 (46%) 



Query: 


462 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 




+G+T LH+AA+ GQ ++ LV+ GA VNA G TPL++A Q+ + V LL A+ 




Sbjct: 


77 


KGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENHLEWKFLLENGAN 


136 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVES-CRL 


558 




V +G TPL +A GHE+ V L+ Y + RL 




Sbjct: 


137 


QNVATEDGFTPLAVALQQGHENWAHLINYGTKGKVRLPALHIAARNDDTRTAAVLLQND 


196 


Query : 


559 


DIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 


615 




D+ ++ G TPLHIAA + V + LL GAS + TPL A S+ +V+ 




Sbjct: 


197 


PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTPQNGITPLHIA — SRRGNVIM 


254 


Query: 


616 


AYHLSFERRQKSSEAPVQSPQRSVDSISQESSTS-SFSSMSAGSR-QEETKKDYREVEKL 


673 




L +R + E + + ++ S + G+ Q +TK + 




Sbjct: 


255 


V-RLLLDRGAQI-ETKTKDELTPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHM- 


311 


Query: 


674 


LRAVADGD-LEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPA 


732 




A GD L+ VR LL++ E ++D T+ P H C R+AKV 




Sbjct: 


312 


— AAQGDHLDCVRLLLQYDAE-IDDI— TLDHLTP— LHVAAHC GHHRVAKVLL 


358 


Query: 


733 


S-GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQ 


791 




G N + +G +PLH+A ++ LLLK GA+ A PLH+A GH 




Sbjct: 


359 


DKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLP 


418 


Query: 


792 


VVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAV 


851 




+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 




Sbjct: 


419 


I VKNLLQRGASPNVSNVKVETf LHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAA 


478 


Query: 


852 


IEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVV 896 






H +V+LLL + A+ + T + A + + +L ++ 




Sbjct : 


479 


RIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALL 523 




Score 


- 378 


(56,7 bits), Expect - 1.2e-30, Sum P(2) - 1.2e-30 




Identities - 130/447 (29%), Positives - 195/447 (43%) 




Query: 


465 TPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 


524 




TPLH AA G + ++L+ GA + A +G +P+H+A Q + LLL Y A + 




Sbjct: 


274 


TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDD 


333 


Query: 


525 




557 




+ TPLH+A GH K L+ + +C + 




Sbjct: 


334 


ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 


393 


Query: 


558 


LDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 


614 




+ D EG TPLH+A+ G+ +++ LLQ GAS + N ETPL A + V 




Sbjct: 


394 


GAS I DA VTESGLT PLH VA S FMGHL P I VKN L LQRGAS PNVS NVK VET P LHMAARAGHT E V A 


453 
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Query: 


615 


EAYHLS FERRQKS SEAPVQS PQRSVDSI SQES STSSFS SMSAGSRQEETKKDYREVEKLL 


674 




+ Y L + + + Q-fP 1+ +A T L 




Sbjct: 


454 


K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 


508 


Query: 


675 


RAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPASG 


734 




A +G +E V LLE ++AT PH+KA+L + 




Sbjct: 


509 


IAAREGHVETVLALLE— KEASQACMTKKGFTP-- LHVAAKYGKVRVAELLLER D 


559 


Query: 


735 


LGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVK 


794 




N ++G +PLHVA H D+++LLL G + + + PLH+A +Q +V + 




Sbjct: 


560 


AHPNAAGKNGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVAR 


619 


Query: 


795 


CLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEK 


854 




LL N + + G TPL A GH E+VALLL A+ N N G T LH E 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEG 


679 


Query: 


855 


HVFVVELLLLHGASVQVLNKRQRTAVDCAEQ--NSKIMELL 893 








HV V ++L+ HG V + T + A N K+++ L 




Sbjct: 


680 


HVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 720 




Score 


- 367 


(55.1 bits). Expect - 1.8e-29, Sum P(2) - 1.8e-29 




Identities - 131/489 (26%), Positives - 210/489 (42%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPL-CFCDDCEKLVSGRLNDPSVVTPFSRD 


460 




HIAS GN V LL + + + PL C + +S L D ++ 




Sbjct: 


244 


HIASRRGNV1MVRLLLDRGAQIETKTKDELTPLHCAARNGHVRISEILLDHGAPIQ-AKT 


302 


Query: 


461 


DRGHT PLH VAAVCGQAS L I DLLVS KGAMVNATDYHGAT PLH LACQKG YQS VT LLLLHY KA 


520 




G +P+H+AA + LL+ A ++ TPLH+A G+ V +LL A 




Sbjct: 


303 


KNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLOHLTPLHVAAHCGHHRVAKVLLDKGA 


362 


Query: 


521 


SAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGV 


580 




+ NG TPLH+AC H ++ L+ +D E G TPLH+A+ G+ + 




Sbjct: 


363 


KPNSRALNGFTPLHIACKKNHVRVMELLLK TGASIDAVTESGLTPLHVAS FMGHLPI 


419 


Query: 


581 


IETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQSPQR 


637 




++ LLQ GAS + N ETPL A ++++ + + K + P+ R 




Sbjct: 


420 


VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDOQTPLHCAAR 


479 


Query: 


638 


SVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTE 


693 




+ + + E++ + + +AG VE +L + + +T 




Sbjct: 


480 


1GHTNMVKLLLENNANPNLATTAGHTPLHI AAREGHVETVLALLEKEASQACMTKKGFTP 


539 


Query: 


694 


EDLEDAEDTVSAAD— PEFCHPLCQ CP-KCAPAQKRLAKVPA— SGLGVNVTS 


741 




+ VA+ HP PALV G+ + 




Sbjct: 


540 


LHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPA 


599 


Query: 


742 


QDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNA 


801 




+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct: 


600 


WNGYTPLHI AAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEHVALLLSKQA 


659 


Query: 


802 


KPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNKKGNTALHEAVIEKHVFVVEL 


861 




N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ 




Sbjct: 


660 


NGNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKF 


719 


Query: 


862 


LLLHGASVQVLNK 874 








LL H A V K 




Sbjct: 


720 


LLQHQADVNAKTK 732 




Score 


- 345 


(51.8 bits). Expect - 4.2e-27, Sum P(2) - 4.2e-27 




Identities - 146/506 (28%), Positives » 233/506 (46%) 




Query: 


404 


HIAS--GNQKEVERLLSQEDHDKDTVQK MCHPLCFCDDCEKLVSGRLNDPSVVTPFS 


458 




H+AS G+ K V LL +E + T +K H +++V +N + V + 




Sbjct: 


50 


HLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQ-DEVVRELVNYGANVN--A 


106 


Query: 


459 


RDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHY 


518 




+ +G TPL++AA ++ L+ GA N G TPL +A Q+G+ ++V L++Y 




Sbjct: 


107 


QSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVALQQGHENVVAHLINY 


166 


Query: 


519 


KASAEVQDNNGNTP-LHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGY 


577 




+V+ P LH+A ++D A V + D+ ++ G TPLHIAA + 




Sbjct: 


167 


GT KGKVR LPALHI AAR- - N DDTRTAAVLLQN DP^NPDVLSKTGFTPLHI AAH YEN 


218 


Query: 


578 


QGVIETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLS FERRQKS SEAPVQS 


634 






V + LL GAS + TPL A N ++ ++ E + K P+ 




Sbjct: 


219 


LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHC 


278 


Query: 


635 


PQRSVDS I SQES STSSFS SMS AGS RQEETKKOY RE VEKLLRAVADGD-LEMVRYLLEWTE 


693 




R+ E + + A +TK + A GD L+ VR LL++ 




Sbjct: 


279 


AARNGHVRISEILLDHGAP1QA KTKMGLSPIHM AAQGDHLDCVRLLLQYDA 


329 



643 



WO 01/12659 



PCT/IB00/01496 



Query: 


694 


Sbjct: 


330 


Query: 


730 


Sbjct: 


389 


Query: 


789 


Sbjct: 


449 


Query: 


849 


Sbjct: 


509 


Score 


= 243 


Identities ■ 


Query: 


404 


Sbjct: 


541 


Query: 


462 


Sbjct: 


601 


Query: 


522 


Sbjct: 


661 


Query: 


582 


Sbjct: 


718 


Score 


- 242 


Identities ■ 


Query: 


734 


Sbjct: 


229 


Query: 


794 


Sbjct: 


289 


Query: 


854 


Sbjct: 


349 


Score 


- 242 


Identities - 


Query: 


404 


Sbjct: 


508 


Query: 


4 62 


Sbjct: 


568 


Query: 


522 


Sbjct: 


628 


Query: 


582 


Sbjct: 


685 


Query: 


638 


Sbjct: 


745 


Score 


- 235 


Identities « 


Query: 


734 


Sbjct: 


625 



-CAPAQKRLAK 729 
C R+ + 



++ G +PLHVA+ 



+++ LL+ GA+ 



+ K+ T + A + K+ 



(36.5 bits], Expect ■ 1.6e-14, Sum P(2) » 1.6e-14 

■ 64/199 (32%), Positives - 97/199 (48%) 

HIAS--GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 461 
H+A+ G + E LL ++ H + PL L +L P +P S 

HVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAW 600 

RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 521 

G+TPLH+AA Q + L+ G NA G TPLHLA Q+G+ + LLL +A+ 
NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 660 

AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 581 

+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 

GNLGN KSGLT P LH LVAQEGH V PVADVLI KHGV---MV DATTRMG YT PLHVASH YGN I KLV 717 

ETLLQNGASTEIQNRLKETPL 602 
♦ LLQ+ A + +L +PL 
KFLLQHQADVNAKTKLGYSPL 738 

(36.3 bits). Expect - 5.0e-29, Sum P(2) - 5.0e-29 

■ 63/176 (35%), Positives • 92/176 (52%) 



KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 853 
+ LLD A K +G +P+ A G H + V LLLQ+ A 1+ T LH A 



H V ++LL GA + + LN 



+ C + + ++MELL 



AS+D V E+ 



80/284 (28%), Positives = 129/284 (45%) 



HIA+ G+ + V LL +E 



LL+ +G ++ ++G TPLH+A ++ 



GH + V L+ 



+ +GN+ G TPL.H+ A+ G+ V 
-NLGNKSGLTPLHLVAQEGHVPVA 684 



R+ TPL A 



-NSKILSVMEAYHLSFERRQKSSEAPV-QSPQR 637 
N K++ + + + K +P+ Q+ Q+ 



-ESSTSSFSSMSAGSRQEETKK- 
++ S S G+ K 



-DYREVEKLLRAVAD 679 
Y V +L+ V O 



(35.3 bits), Expect - 7.9e-34, Sum P(2) » 7.9e-34 
58/165 (35%), Positives - 83/165 (50%) 



N S G +PLH+AA G A+++ LLL 



PLHL Q+GH V 



644 
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Query: 794 KCLLDSNAKPNKXDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 853 

L+ + G TPL A G+ +LV LLQH A +HA G + LH+A + 

Sbjct: 685 DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 744 

Query: 854 KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS — KIMELLQVV 896 

H +V LLL +GAS ++ T + A++ + ++L+VV 

Sbjct: 745 GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKW 789 

Score = 233 (35.0 bits). Expect - 7.9e-34, Sum P(2) - 7.9e-34 
Identities - 67/202 (33%), Positives » 100/202 (49%) 



Query: 


404 


HIAS-GKQKEVERLLSQEDHDKDTVQKMCH--PLCFCDDC-EKLVSGRLNDPSWTPFSR 


459 




H+A+ G+ + RLL QD + D+ + H PL C V+ L D P SR 




Sbjct: 


310 


HMAAQGDHLDCVRLLLQYDAEIDDIT-LDHLTPLHVAAHCGHHRVAKVLLDKGA-KPNSR 


367 


Query: 


460 


DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 




G TPLH+A +++LL+ GA ++A G TPLH+A G+ + LL 




Sbjct: 


368 


ALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLPIVKNLLQRG 


427 


Query: 


520 


ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 


579 




AS V + TPLH+A GH + K L+ ++ + + TPLH AAR G+ 




Sbjct: 


428 


ASPNVSNVKVETPLHMAARAGHTEVAKYLLQ— NKAKVNAKAKDDQTPLHCAARIGHTN 


484 


Query: 


580 


VIETLLQNGASTEIQNRLKETPLKCA 605 








+++ LL+N A+ + TPL A 




Sbjct: 


485 


MVKLLLENNANPNLATTAGHTPLHIA 510 




Score 


- 226 


(33.9 bits), Expect - 7.0e-33, Sum P(2) - 7.0e-33 




Identities - 53/153 (34%), Positives » 83/153 (54%) 




Query: 


743 


DGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQCHFQVVKCLLDSNAK 


802 




+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct: 


601 


NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 


660 


Query: 


803 


PNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVELL 


862 




N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 


720 


Query: 


863 


LLHGASVQVLNKRQRTAVDCAEQ--NSKIMELL 893 








LHAV K ++AQ++I+LL 




Sbjct: 


721 


LQHQADVNAKTKLGYSPLHQAAQQGHTDIVTLL 753 




Score 


« 198 


(29.7 bits). Expect - 2.5e-ll, Sum P(2) - 2.5e-ll 




Identities - 51/157 (32%), Positives - 82/157 (52%) 




Query: 


737 


VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQWKCL 


796 




+ T++ G++ LH+AAL G+ +++R L+ +GAN A++ PL++A Q+ H +WK L 




Sbjct: 


71 


LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFL 


130 


Query: 


797 


LDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 


856 




L++ AN G TPL A GH +VA L+ +G ALH A 




Sbjct: 


131 


LENGANQNVATEDGFTPLAVALQQGHENWAHLINYGTK GKVRLPALHIAARNDDT 


186 


Query: 


857 


FVVELLLLHGASVQVLNKRQRTAVDCAE--QNSKIMELL 893 








+LL + + VL+K T + A +N + +LL 




Sbjct: 


187 


RTAAVLLQNDPNPDVLSKTGFTPLHIAAHYENLNVAQLL 225 




Score 


- 186 


(27.9 bits), Expect - 6.6e-29, Sum P(2) - 6.6e-29 




Identities - 55/143 (38%), Positives - 68/143 (47%) 




Query: 


463 


GHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASA 


522 




GHTPLH+AA G + L+ K A G TPLH+A + G V LLL A 




Sbjct: 


503 


GHTPLHIAAREGHVETVLALL£KEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHP 


562 


Query: 


523 


EVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHI AARWGYQGVIE 


582 




NG TPLH+A + + D VK L+ S N G TPLHIAA+ V 




Sbjct: 


563 


NAAGKNGLTPLHVAVHHNNLDIVKLLLPRG-GSPHSPAWN--GYTPLHI AAKQNQVEVAR 


619 


Query: 


583 


TLLQNGASTEIQNRLKETPLKCA 605 








+LLQ G S ++ TPL A 




Sbjct: 


620 


S L LQY GG SANAESVQGVT PLHLA 642 




Score 


- 182 


(27.3 bits), Expect - 2.9e-28, Sum P(2) - 2.9e-28 




Identities - 54/185 (29%), Positives - 89/185 (48%) 




Query: 


738 NVTSQIXSSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLL 


797 




N+ ++ G +PLH+ A G + +L+KHG A PLH+A G+ ++VK LL 




Sbjct: 


662 


NLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLL 


721 


Query: 


798 


DSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVF 


857 




A N K G +PL A GH ++V LLL++GAS N ++ G T L A ++ 








645 





WO 01/12659 
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Sbjct: 722 QHQADVNAKTKLGYSPLHQAAQQGHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 781 

Query: 858 VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDvAETDRKEYVTV 917 

v ++L + V++ V+S PV + DV+E + +E + + 

Sbjct: 782 VTDVLKV VT DETS FVLVS DKH RMS FPETVDEILDVSEDEGEELISF 827 

Query: 918 KIRKK 922 
K ++ 

Sbjct: 828 KAERR 832 

Score - 180 (27.0 bits), Expect - 5.0e-29, Sum P(2) - 5.0e-29 
Identities - 41/121 (33%), Positives - 67/121 (55%) 

Query: 486 GAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCV 545 

G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 

Sbjct: 35 GV DI NTCNQNGLNGLH LAS KEGH VKMV VELLH KE 1 1 LETTT K KGNTALH I AALAGQDE VV 94 

Query: 546 KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 605 

+ LV Y ++ ++KG TPL++AA+ + V++ LL+NGA+ + TPL A 

Sbjct: 95 RELVNY GANVNAQSQKGFTPLYMAAQENHLEvVKFLLENGANQNVATEDGFTPLAVA 151 

Query: 606 L 606 
L 

Sbjct: 152 L 152 

Score - 166 (24.9 bits). Expect - 3.4e-06, Sum P(2) - 3.4e-06 
Identities - 89/318 (27%), Positives - 140/318 (44%) 

Query: 448 LNDPSVVTPFSRDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKG 507 

L + + V ++DD+ TPLH AA G +++ LL+ AN G TPLH+A ++G 

Sbjct: 457 LQNKAKVNAKAKDDQ--TPLHCAARIGHTNMVKLLLENNANPNLATTACHTPLHI AAREG 514 

Query: 508 YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 552 

+ L LL +AS G TPLH+A YG + L+ D 

Sbjct: 515 HVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLH 574 

Query: 553 — VESCRLDI GNE KGDT P LH I AARWG YQG V I ET L LQNGAST E I QN RL 597 

V LOI G+ G TPLHIAA+ V +LLQ G S ++ 

Sbjct: 575 VAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQ 634 

Query: 598 KETPLKCALNSKI LSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSM-SA 656 

TPL A M A LS +Q + +S + ++QE + 

Sbjct: 635 GVT P LH LAAQEGH A E - MVA LL LS KQANGN LGNKSGLTP LH L VAQEGH V PVADVL I KH 690 

Query: 657 GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 716 

G + T + LA G++++V++LL+ + D+ +A+ + + PL Q 

Sbjct: 691 GVMV DATTR - - MG YT PLH VAS H YGN I KL VK FLLQH -QA DV - N AKT KLG YS PLHQ 740 

Query: 717 CPKCAPAQKRLAKVPASGLGVNVTSQDGSSPLHVA 751 

+ + + +G N S DG++PL +A 

Sbjct: 741 AAQQGHTDI -VTLLLKNGASPNEVSSDGTTPLAI A 774 

Score =■ 162 (24.3 bits), Expect - 1.8e-07, Sum P(2) - 1.8e-07 
Identities - 48/149 (32%), Positives - 71/149 (47%) 

Query: 737 VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQWKCL 796 

V D ++ AA G D L++G + N + LHLA ++GH ++V L 

Sbjct: 5 VGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 64 

Query: 797 LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 856 

L GNT LAG E+V L+ +GA++NA + KG T L+ A E H+ 

Sbjct: 65 LHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHL 124 

Query: 857 FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 

VV+ LL +GA+ V + T + A Q 
Sbjct: 125 EVVKFLL ENG ANQNVATE DGFT PLAVALQ 153 

Score - 158 (23.7 bits), Expect - 5.7e-26, Sum P(2) » 5.7e-26 
Identities - 38/135 (28%), Positives - 65/135 (48%) 

Query: 4 60 DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 519 

+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 
Sbjct: 42 NQNGLNG LH LAS K EGHVKMVV ELL H KE 1 1 LETTT KKGNTALH I AALAGQDE VVRE L VNYG 101 

Query: 520 ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 579 

A+ Q G TPL++A H + VK L+ ++ EG TPL +A + G++ 

Sbjct: 102 AN VNAQS QKG FT P L YMAAQENH LE VVK FLLE N GANQN VATE DG FT PLA VALQQGH EN 158 

Query: 580 VIETLLQNGASTEIQ 594 

V+ L+ G +++ 
Sbjct: 159 WAHLINYGTKGKVR 173 
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Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbjct: 

Score ■ 106 
Identities 1 

Query: "769 



497 
13 

555 
71 

615 

128 



Score - 115 (17.3 bits), Expect - 1.8e-21, Sum P<2) - 1.8e-21 
Identities - 37/119 (31%), Positives - 56/119 (48%) 

ATPLHLACQKGYQSVTLLLLHYKASAEVQ--DNNGNTPLHLACTYGHEDCVKALVYYDVE 554 
AT A + G ++ L H + ++ + NG LHLA GH V L++ ++ 
ATSFLRAARSG— NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEI I 70 

SCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 

L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

— LETTTKKGNTALHIAAIAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVV 127 

E 615 
+ 

K 128 

(15.9 bits), Expect - 1.8e-01, Sum P{2) - 1.6e-01 
■ 34/121 (28%), Positives - 54/121 (44%) 

NAGARNADQAVPLHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 
+ GRADA A + G+ L + N+-+G LA GH ++V 

4 SVGFREADAATSFLRAARSGNLDKALDHLRWGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

829 LLQHGASINASNNKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 
LL + + KGNTALH A + VV L+ +GA+V +++ T + A Q + 

64 LLHKEIILETTTKKGNTALHIAAIAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENH 123 

889 I 889 
+ 

124 L 124 



Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



Score - 40 (6.0 bits). Expect - 1.6e-14, Sum P(2> - 1.6e-14 
Identities - 11/56 (19%) , Positives » 23/56 (41%) 

Query: 622 ERRQKSSEAPVQS PQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ+++Q++ Q++ +K++R V 

Sbjct: 1614 DRRQQGQEEQVQEAKNT FTQ VVQGN E FQN I PGEQVT E EQFT DEQGN I VT K K 1 1 RKV 1669 

Score - 38 (5.7 bits), Expect = 2.6e-14, Sum P(2) - 2.6e-14 
Identities = 6/12 (50%), Positives - 10/12 (83%) 

Query: 806 KDLSGNTPLIYA 817 

+D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 



Pedant information for DKFZphtes3_1817, frame 2 



Report for DKFZphtes3_1817.2 



[LENGTH] 
[MW] 
[pU 
[HOMOL] 
complete 
[FUN CAT] 
I FUNCAT] 
3e-12 
t FUNCAT] 

( FUNCAT] 
( FUNCAT] 
I FUNCAT] 
I FUNCAT] 
[FUNCAT] 

3e-08 

[FUNCAT] 

[FUNCAT] 

5e-05 

[FUNCAT] 

[FUNCAT] 

5e-05 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[SCOP] 

[EC] 

(PIRKW] 

[PIRKW] 



1050 

117013.72 
6.47 

TREMBL:DMANKY 1 product: "ankyrin"; Drosophila melanogaster ankyrin mRNA, 
cds. 2e-4 5 

08-19 cellular import [S. cerevisiae, YOR034c] 5e-13 
■ 10.05.99 other pheromone response activities [S. cerevisiae, YDR264C] 

03.07 pheromone response, mating-type determination, sex-specific proteins 
S. cerevisiae, YDR264c] 3e-12 

99 unclassified proteins (S. cerevisiae, YILU2w] 2e-ll 

06.13.01 cytoplasmic degradation [S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e-08 
04.05.01.07 chromatin modification IS. cerevisiae, YIR033w] 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YGR233c] 



08.13 vacuolar transport [S. cerevisiae, YML097c) 5e-05 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YML097c) 



cerevisiae, YML097c] 5e-05 

(S. cerevisiae, YML097c] 



30.03 organization of cytoplasm [S. 
08.07 vesicular transport (golgi network, etc.) 

03.22 cell cycle control and mitosis [S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllc] 3e-04 
BL00901A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1.91.3.1.2 GA binding protein «y\BP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus le-13 
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PIRKW] 


potassium channel 5e — 15 




early protein 2e— 13 


PIRKW] 


tumor suppressor le— 09 


PX RKW) 




PI RKW) 






he t e r odime r le— 14 


P X RKW ) 


potassium transport 5e 15 


PIRKW] 


ceil cycle control ie~iu 


PIRKW] 


serine/thr eonine-specif ic protein kinase le — 19 


PIRKW] 


transmembrane protein 5e— 15 


PIRKW] 


transport protein Se~15 


PIRKW] 


DNA binding 2e—ll 


PIRKW] 


oncogene le-08 


PIRKW) 


ATP le-19 , . , . 


PIRKW) 


protein kinase inhibitor ie-U9 


PIRKW) 


voltage-gated ion channel 5e-15 


PIRKW) 


phosphoprotein 4e-38 


PIRKW) 


apoptosis le~19 


PIRKW) 


liver 4e-09 


PIRKW) 


integrin binding 3e-16 


PIRKW) 


differentiation 2e~12 


PIRKW) 


transforming protein le-08 


PIRKW) 


alternative splicing le — 40 


PIRKW) 


coiled coil le-14 


PIRKW] 


peripheral membrane protein 2e-38 


PIRKW] 


transcription factor 4e-16 


PIRKW] 


transcription regulation 2e-16 


PIRKW] 


nucleotide binding 5e-15 


PIRKW] 


phosphoric monoester hydrolase le-12 


PIRKW] 


cytoskeleton 8e-39 


PIRKW] 


calmodulin binding le-19 


PIRKW] 


smooth muscle le-12 


SUPFAM] 


ankyrin le-40 


SUPFAM] 


death-associated protein kinase le-19 


SUPFAM] 


ankyrin repeat homology le-40 


SUPFAM] 


protein kinase homology le-19 


SUPFAM] 


vaccinia virus 27. 4K Hindi II-C protein homology 3e-07 


SUPFAM] 


int-3 transforming protein le-08 


SUPFAM] 


unassigned ankyrin repeat proteins 2e-38 


SUPFAM] 


notch protein 2e-12 


SUPFAM] 


fowlpox virus BamHi -ORF7 protein 2e-13 


SUPFAM] 


rel homology 2e-ll 


SUPFAM) 


EGF homology 2e-12 


PROSITE) 


ATP_GTP_A 1 


PFAM] 


Ank repeat 


KW] 


Irregular 


KW] 


3D 


KW] 


LOW_COMPLEXITY 3.05 % 



SEQ MALYDEDLLKNPFYLALQKCRPDLCSKVAQIHGIVLVPCKGSLSSSIQSTCQFESYILIP 

SEG • 

lawcB 

SEQ VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPILFEETFYNEKEESFSILCIAHPLEKR 

SEG 

lawcB 

SEQ ESSEEPLAPSDPFSLKTIEDVREFLGRHSERFORNIASFHRTFRECERKSLRHHIDSANA 

SEG 

lawcB 

SEQ LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVEIYVHHEIYNLIFKYVGTMEASEDAAFN 

SEG 

lawcB 

SEQ KITRSIiQDLQQKDIGVKPEFSFNIPRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 

SEG 

lawcB 

SEQ RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 

SEG xxxxxxxxxx , 

lawcB 

SEQ YIRQGSLSAKPPESEGFGDRLFLKQRMSLLSQMTSSPTDCLFKHIASGNQKEVERLLSQE 

SEG 

lawcB 

SEQ DHDKDTVQKMCHPLC FCDDCEKLVSGRLNDPSVVTPFSRDDRGHTPLHVAAVCGQASLID 

SEG 

lawcB 
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SEQ LLVSKGAMVNATOYHGATPLHIACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHIACTYG 

SEG ' 

lawcB 



SEQ HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 

SEG 

lawcB 

SEQ PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFS5MSAGSRQ 

SEG xxxxxxxxxxxxxxxxxxxxxx - 

lawcB 

SEQ EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

SEG 

lawcB 



SEQ APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

SEG 

lawcB CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTTTTCCH 

SEQ LHLACQQGHFQWKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 

SEG 

lawcB HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHHHHHHHHCCCTTTTEE 

SEQ NKGNTALHEAVI EKHVFVVELLLLHGAS VQVLNKRQRTAVDCAEQNSKIMELLQWPSCV 

SEG 

lawcB TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHC 

S EQ AS LDDVAETDRKEYVTV K I RKKWN S KL Y DL P DE P FT RQFY FV H S AGQFKGKT S REIMARD 

SEG 

lawcB 

SEQ RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

SEG 

lawcB 

SEQ RHTV E DA WS QG PEAAG P LST PQE V SAS RS 

SEG 

lawcB 



Prosite for DKFZphtes3_1817 . 2 
PS00017 94 5->953 ATP__GTP_A PDOC00017 



Pfam for DKFZphtes3_1817 . 2 



HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+AA ++ +.+++LL+++GA +N 
Query 4 63 GHTPLHVAAVCGQASLIDLLVSKGAMVN 490 

32.12 (bits) f: 496 t: 523 Target: dkf zphtes3_1917 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Que r y *GyTPLH I AARyNNvEMV r ILLQHGADI N* 

G TPLH+A++ + ++ LL1* + A+ 

dkf2phtes3 496 GATPLHLACQKGYQSVTLLLLHYKASAE 523 

Query f: 529 t: 556 Target: dkf zphtes3_1817 .2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM * GyT PLH I AARyNNvEMV rlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 529 GNTPLHLACTYGHEDCVKALVYYDVESC 556 

42.65 (bits) f: 565 t: 592 Target: dkf zphtes3_1817 .2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+TPLHIAAR + +++ LLQ+GA+ 

dkfzphtes3 565 GDTPLHIAARWGYQGVIETLLQNGASTE 592 

Query f: 744 t: 771 Target: dkf zphtes3_1817 .2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G +PLH+AA +++ +++RLLL+HGA+ 
Query 744 GSSPLHVAALHGRADLIRLLLKHGANAG 771 
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36.38 (bits) f: 777 t: 804 Target: dkf iphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query 'GyTPLHIAARyNNvEMVrlLLQHGADIN* 
PLH+A+++++ ++V+ LL+ +A +N 

dltfzphtes3 777 QAVPLHLACQQGHFQVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkfzphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHI AARyNNvEMVrlLLQHGADIN* 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GNTPLIYACSGGHHELVALLLQHGASIN 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLH I AARyNNvEMVrlLLQHGADIN* 

G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein YFL04 6w. 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to *FL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map="405.0/.3 cR from top of Chrll linkage group" 
insert length: 1395 bp 

Poly A stretch at pos. 1367, no polyadenylation signal found 



1 GGGACCACGG TGGCGCCTGC GCTGGGAGGT GAGCTTGTGA CAGAGCGAAA 

51 ACTACAATTC CCAGCATTCC TGTGGTGCCA GAACTACCTT GCCCGAAAGC 

101 CTGTGCGAGA TTTACCCCGT CTTCCGCCTC CCTCCCACCG GAAAACTCTG 

151 AGGACATGAA TAGTCGCCAG GCTTGGCGGC TCTTTCTCTC CCAAGGCAGA 

201 GGAGATCGTT GGGTTTCAAG GCCCCGCGGG CATTTCTCGC CGGCCCTGCG 

251 GAGAGAGTTC TTCACTACCA CAACCAAGGA GGGATATGAT AGGCGGCCAG 

301 TGGATATAAC TCCTTTAGAA CAAAGGAAAT TAACTTTTGA TACCCATGCA 

351 TTGGTTCAGG ACTTGGAAAC TCATGGATTT GACAAAACAC AAGCAGAAAC 

401 AATTGTATCA GCGTTAACTG CTTTATCAAA TGTCAGCCTG GATACTATCT 

451 ATAAAGAGAT GGTCACTCAA GCTCAACAGG AAATAACAGT ACAACAGCTA 

501 ATGGCTCATT TGGATGCTAT CAGGAAAGAC ATGGTCATCC TAGAGAAAAG 

551 TGAATTTGCA AATCTGAGAG CAGAGAATGA GAAAATGAAA ATTGAATTAG 

601 ACCAAGTTAA GCAACAACTA ATGCATGAAA CCAGTCGAAT CAGAGCAGAT 

651 AATAAACTGG ATATCAACTT AGAAAGGAGC AGAGTAACAG ATATGTTTAC 

701 AGATCAAGAA AAGCAACTTA TGGAAACAAC TACAGAATTT ACAAAAAAGG 

751 ATACTCAAAC CAAAAGTATT ATTTCAGAGA CCAGTAATAA AATTGACGCT 

flOl GAAATTGCTT CCTTAAAAAC ACTGATGGAA TCTAACAAAC TTGAGACAAT 

851 TCGTTATCTT GCAGCTTCGG TGTTTACTTG CCTGGCAATA GCATTGGGAT 

901 TTTATAGATT CTGGAAGTAG TATTAATGCT CATCCTGCTG TGGCTGTTGG 

951 CTTCTTAGAA CACCAAACCG GGAGAGATTT ACTTTGAACA TTGTCAGTTG 

1001 CAGCAAAAAT TTACTACACA AGATTATTCG AAGTGTATAC GGACTAAAAG 

1051 AGGAAGTGTT TTAGAATGAG AAGAGATACT GTGTCTTTAT TGTGTGTGTG 

1101 TGAGTGCAGG TGTGTGTCTT TATTATATTG AAAAGCTGTC ACTCAGACCT 

1151 GGTTTGAGAT AGAAGAGCAT TTTGTCCTTT TGATAGTTAA TAGAAATTGA 

1201 ACCAGAGTTT TCTTATGTTT GCTTGAACAG TTGTGTAAAT CATACAGGAT 

1251 TTTGTGGGTA TTGGTTGAAT ATTTGTAAAC CATTCCCTAG CCTACATATT 

1301 TATTACTGAA TTAACTTTCC TGATAACCAT TGCATAATTA CATTTTTCTA 

1351 TAAAATGAAA GATTATTACA ACAAAAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS419346 from database EMBL: 
human STS wi-13569. 
Score - 2154, P - 8.6e-91, identities = 446/459 

Entry HS1292427 from database EMBL: 
human STS SHGC-50338. 
Score " 1737, P - 7.2e-72, identities - 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score - 1578, P - l.Oe-64, identities ■» 358/397 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 



1 MNSRQAWRLF LSQGRGDRWV SRPRGHFSPA LRREFFTTTT KEGYDRRPVD 

51 ITPLEQRKLT FDTHALVQDL ETHGFDKTQA ETIVSALTAL SNVSLDTIYK 

101 EMVTQAQQEI TVQQLMAHLD AIRKDMVILE KSEFANLRAE NEKMKIELDQ 

151 VKQQLMHETS RIRADNKLDI NLERSRVTDM FTDQEKQLME TTTEFTKKDT 

201 OTKSIISETS NKIDAE1ASL KTLMESNKLE TIRYLAASVF TCLAIALGFY 

251 RFWK 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19f 19, frame 3 

SWISSPROT:YAN8_SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME 
I,, N = 1, Score = 144, P - 8.4e-09 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces 
cerevisiae), N - 1, Score - 138, P - S.4e-08 

>SWISSPROT: YAN8 SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length - 211 



Score » 144 (21.6 bits), Expect - 8.4e-09, P = 8.4e-09 
Identities - 34/121 (28%), Positives - 67/121 (55%) 

Query: 70 LETHGFDKTQAET I V SALTALSNVS L DT I Y K EMVTQAQQE - 1 T VQQLMAH L DA I RKDMV I 128 

LE G+ AETI + + ++ +L + K + +A+QE ++ QQ L IRK + 
Sbjct: 46 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 IEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 

Pedant information for DKFZphtes3_19f 19, frame 3 
Report for DKFZphtes3_19f 19 . 3 

[LENGTH] 254 

[MW] 29505.73 

[pi] 6.99 . , 

[HOMOL] PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 

2e-10 

[ FUNCAT ) 99 unclassified proteins [S. cerevisiae, YFL046wJ 8e-12 

[PROSITE] RGD 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 5.12 % 

[KW] COILED_COIL 11.02 % 

SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVDITPLEQRKLT 

SEG 

PRD ccehhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTHALVQDLETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQEITVQQLMAHLD 

SEG 

PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 
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MEM 

SEQ AIRKDMVILEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDH 

SEG 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ FTDQEKQLMETTTEFTKKDTQTKSIISETSNKIDAEIASIiKTLMESNKl.ETIRyLAASVr 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM MMMMMMM 

SEQ TCLAIALGFYRFWK 

SEG 

PRD hhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMM 



Prosite for DKF2phtes3_19f 19 . 3 
PS00016 15->18 RGD PDOC00016 



(No Pfam data available for DKF2phtes3_19fl9.3) 
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DKFZphtes3_19jl7 



group: testes derived 

DKFZphtes3_19jl7 encodes a novel 436 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rap5/WWP domain signatures. 

The ww domain (or rsp5 or wwp domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophin, vertebrate YAP 
protein, mouse NEDD-4 and yeast RSP5. 

The domain is repeated up to 4 times in some proteins, it has been shown to bind proteins with 
particular proline-motif s, 1 AP] -P-P- [AP] -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to C.elegans Y40B1A.2 

there are two long ORE'S in this CDNA according to EST: 
HS12146/HS75086/AA923755/MMAA1733S remaining intron at Bp 1506-1733 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2762 bp 

Poly A stretch at pos. 2740, no polyadenylation signal found 



1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATAAAA TGTTGCGGAG 
401 ATCTGATAGT CCTGAAAACA AATACAGTGA CAGCACAGGT CACAGTAAGG 
451 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTCTTC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCAATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAGATGGCAG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
1201 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 
1251 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 
1351 ATTCTAATGT GGACATATCT AAAATAAATG AAGTTCTTAC AGCAGCTGTG 
1401 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGAAGTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 
1751 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
IB 01 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 
IB 51 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACAGAAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG CCTGCACTAG CAGCACACTT CAGTGAAAAT CTCATAAAAC 

2151 ACGTTCAAGG ATGGCCTGCA GATCATGCAG AGAAGCAGGC ATCAAGATTA 

2201 CGCGAAGAAG CGCATAACAT GGGAACTATT CACATGTCCG AAATTTGTAC 

2251 TGAATTAAAA AATTTAAGAT CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 

2301 CTTTGCGAGA GCAAAGGATA CTATTTTTGA GACAACAAAT TAAGGAACTT 

2351 GAAAAGCTAA AAAATCAGAA TTCCTTCATG GTGTGAAGAT GTGAATAATT 

2401 GCACATGGTT TTGAGAACAG GAACTGTAAA TCTGTTGCCC AATCTTAACA 

2451 TTTTTCAGCT GCATTTAAGT AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 

2501 AATGACAAGG GGACGGGGTC TGTGAGAGTC AATTCAGGGG AAAGATACAA 

2551 GATTGATTTG TAAAACCCTT GAAATGTAGA TTTCTTGTAG ATGTATCCTT 

2601 CACGTTGTAA ATATGTTTTG TAGAGTGAAG CCATGGGAAG CCATGTGTAA 

2651 CAGAGCTTAG ACATCCAAAA CTAATCAATG CTGAGGTGGC TAAATACCTA 

2701 GCCTTTTACA TGTAAACCTG TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 

2751 AAAAAAAAAA AA 



BLAST Results 



Entry AC005876 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1188I5 iaap lOpll . 2-10pl2 . 
complete sequence. 

Score - 2130, P - 0.0e+00, identities - 426/426 
12 exons matching Bp 492-2740 



Medline entries 



Ho Medline entry 



Peptide information for frame 2 



ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 

1 MSLTSDASSP RSYVSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPWKQ 
51 GPVSQSATQQ PVTADKQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSNAS 
101 NATVVPQNSS ARSTCSLT PA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 



BLAST P hits 
No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: WW_DOMAIN_l (90-116) 
WW_ DOMAIN 1 (90-116) 



1 MRDAGDPSPP NKMLRRSDSP ENKYSDSTGH SKAKNVHTHR VRERDGGTSY 

51 SPQENSHNHS ALHSSNSHSS NPSNNPSKTS DAPYDSADDW SEHISSSGKK 

101 YYYNCRTEVS QWEKPKEWLE REQRQKEANK MAVNSFPKDR DYRREVMQAT 

151 ATSGFASGME DKHSSDASSL LPQNILSQTS RHNDRDYRLP RAETHSSSTP 

201 VQHPIKPWH PTATPSTVPS SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 

251 SSVPAQKTER KESTSGDKPV SHSCTTPSTS SASGLNPTSA PPTSASAVPV 

301 SPVPQSPIPP LLQDPNLLRQ LLPALQATLQ LNNSNVDISK INEVLTAAVT 

351 QASLQSIIHK FLTAGPSAFN ITSLISQAAQ LSTQDIPLHE GIQMERDTHR 

401 SKWEVKGSLC QKADKQOECL VWNGSIMVQR LLQPSG 



BLASTP hits 
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Ho BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 3 

TREMBL: CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid 
Y40B1A, N « 1, Score - 144, P - 1.8e-09 



>TREMBL:CEY40B1A 2 gene: "Y40B1A . 2"; Caenorhabditis elegans cosmid Y40B1A 
Length - 120 

HSPs: 

Score » 144 {21.6 bits), Expect - 1.8e-09, P - 1.8e-09 
Identities » 30/67 (44%), Positives - 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 146 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 



Pedant information for DKFZphtes3_19j 17, frame 2 



Report for DKFZphtes3_19jl7 . 2 



[LENGTH] 209 

[KW] 22873.85 

[pi] 9.95 

[KW) All_Alpha 

[KW} LOW_COMPLEXITY 13.40 % 



SEQ MSLTSDASSFRSYVSPRISTPQTNTVPIKPLISTPPVSSQPKVSTPVVKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATWPQNSSARSTCSLTPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

SEQ LAAH FSENLIKHVQGWPADHAEKQASKLREEAHNMGTIHMSEICTELKNLRSLVRVCEIQ 

SEG 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRI LFLRQQI KELEKLKNQNSFMV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_19jl7.2) 
(No Pfam data available for DKFZphtes3_19j 17 . 2) 



Pedant information for DKFZphtes3_19jl7, frame 3 



Report for DKFZphtes3_19jl7.3 



[LENGTH] 436 

[MW] 47716.62 

tpl] 8.71 

[HOHOL] TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 6e-08 

IFUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012w] 2e-04 

( FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKL012w] 2e-04 

1FUHCATJ 99 unclassified proteins (S. cerevisiae, YPR152c] 6e-04 

IBLOCKS] BL01159 WW/rsp5/WWP domain proteins 

( PROSITE] VfW_DOMAIN_l 2 

[ PFAM) ww/rsp5/wwp domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSKAKNVHTHRVRERDGGTSYSPQENSHNHS 

SEG xxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYtfCRTEVSQWEKPKEWLE 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

SEQ REQRQKEANKMAVNSFPKDRDYRREVMQATATSGFASGMEDKHSSDASSLLPQNILSQTS 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhhhhcccccccccccccccccccccccccccccc 

SEQ RHNDRDYRLPRAETHSSSTPVQHPIKPVVHPTATPSTVPSSPFTLQS DHQPKKSFDANGA 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLN PTSAPPTSASAVpV 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 

SEQ FLTAGPSAFNITSLISQAAQLSTQDIPLHEGIQMERDTHRSKWEVKGSLCQKADKQQECL 

SEG 

PRD hhcccccceeehhhhhhhhhhhccccccccccccccccccceeeecccchhhhhhhccee 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 



Prosite for DKFZphtes3_19jl7.3 

PS01159 90->116 WW DOMAIN_l PDOC50020 

PS01159 90->116 WW~DOMAIN 1 PDOC50020 



Pram for DKFZphtes3_19jl7.3 



HMM_NAME WW/rsp5/WWP domain containing proteins 

HUM *LPsGWEeHWDpsGRpWYYWNHETkTTQWEpP* 

+ ++W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHI SSSGKK-YYYNCRTEVSQWEKP 115 
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DKE*Zphtes3_lcl 



group: signal transduction 

DKrzphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila - rotund transcript and human n-chimaerin. 

rac small GTPase is associated with type-l phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find application in modulating/blocking the response to a cellular 
receptor, 

similarity to GTPase-activating proteins 

complete cDHA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3237 bp 

Poly A stretch at pos. 3227, no polyadenylation signal found 



1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 

51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 

101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 

151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TTATACTACT 

201 GGTAGATAGA AGAGCTAAAG AAAGATGGAT ACTATGATGC TGAATGTGCG 

251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 

301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 

351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 

401 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA TGTTAAGCTG AAGCATGCAC 

451 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 

501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 

551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 

601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 

651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 

701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 

7 51 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 

801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 

851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 

901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 

951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 

1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 

1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 

1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 

1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 

1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 

1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 

1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATTGAGCAAA 

1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 

1401 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 

1451 CAGCAAAGTG GATGATATCC ATGCTATCTG TAGCCTTCTA AAAGACTTTC 

1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 

1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 

1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 

1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 

1701 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 

1751 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 

1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 

1851 ATGATGGTGG AGCAAGAGAA CATTGACCCC CTACATGTCA TTGAAAACTC 

1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 

1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 

2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 

2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 

2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 

2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 

2201 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 

2251 TGTGCATGAG GGTTTTATTA AAACTATATA TATCTCCCCT TCCTTCTCCT 

2301 CAAGTCACAT AATATCAGCA CTTTGTGCTG GTCATTGTTG GGAGCTTTTA 

2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 

2401 TTCTTTTTGG GGAAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 

2451 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 

2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 

2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2601 GCCTCCTTCT AAGAACCAGA AGTTCTCATT CCCCATTATG AACTGAGCTA 

2651 TAATATGGAG CTTTCATAAA AATGGGATGC ATTGAGGACA GAACTAGTGA 

2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 

2751 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC AACTCGTATT TATCTCTGAT 

2801 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 

2851 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 

2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 

2951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 

3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 

3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 

3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 

3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 

3201 AAAAAAGTAA ATAGCTTTTT CAAAATGAAA AAAAAAA 



BLAST Results 



Entry U82984 from database EMBLEST : 

Homo sapiens DRES 56 mRNA sequence. 

Score - 8775, P - 0.0e+00, identities - 1757/1758 

matches 3' end 



Medline entries 



93074974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP : cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
iraaginal disc 

morphogenesis encodes a protein which ia similar to human Rac 
GTPase-activating 

(racGAP) proteins. 



Peptide information for frame 3 



ORF from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 



1 MDTKMLNVRN LFEQLVRRVE 
51 LGKYKDLLMK AETERSALDV 
101 QLIREMLMCD TSGSIQLSEE 
151 SILSDISFDK TDESLDWDSS 
201 TRSIGSAVDQ GNESIVAKTT 
251 TLQPWNSDST LNSRQLEPRT 
301 VPCGKRIKFG KLSLKCRDCR 
351 LADFVSQTSP MIPSIVVHCV 
401 LRVKTVPLLS KVDDIHAICS 
451 EDNSIAAMYQ AVGELPQANR 
501 GPTIVAHAVP NPDPVTMLQD 
551 DPLHVIENSN AFSTPQTPDI 
601 TLTKNTPRFG SKSKSATNLG 



ILSEGNEVQF IQLAKDFEDF RKKWQRTDHE 
KLKHARNQVD VEIKRRQRAE ADCEKLERQI 
QKSALAFLNR GQPSSSNAGN KRLSTIDESG 
LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 
VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 
ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 
VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 
NEIEQRGLTE TGLYRISGCD RTVKELKEKF 
LLKDFLRNLK EPLLTFRLHR AFMEAAEITD 
DTLAFLMIHL ORVAQSPHTK MDVANLAKVF 
IKROPKVVER LLSLPLEYWS QFMMVEQENI 
KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 
RQGNFFASPM LK 



BLASTP hits 



Entry CEK08E3_4 from database TREMBLNEW: 

gene: "K08E3.6"; Caenorhabditis elegans cosmid K08E3 

Score - 452, P - 2.6e-48, identities = 126/377, positives - 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel -7 - fruit 
fly (Drosophila melanogaster) (fragment) 

Score - 480, P - 9.2e-46, identities - 111/270, positives - 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score - 480, P =- 9.2e-46, identities - 111/270, positives - 155/270 
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Entry DM22539_1 from database TREMBL: 

gene: "rotund"; product: "rnracGAP"; Drosophila melanogaster rnracGAP 
(rotund) gene, complete cds. 

Score - 480, P » 9.2e-46, identities ■ 111/270, positives - 155/270 

Entry S29128 from database PIR: 
N-chimerin - rat 

Score - 336, P - 8.8e-30, identities - 86/253, positives - 128/253 



Alert BLASTP hits for DKFZphtes3_lcl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lcl, frame 3 



Report for DKFZphtes3_lcl . 3 



( LENGTH] 632 

(MW) 71026.84 

[pi) 9.08 

[HOMOL] PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
fruit fly (Drosophila melanogaster) 2e-46 

[FUNCAT) 10.99 other signal-transduction activities [S. cerevisiae, YBR260cJ 3e-12 

[ FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YER155C] 2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YER155cl 2e-ll 

t FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YERlSSc] 

2e-ll 

[ FUNCAT) 03.10 sporulation and germination [S. cerevisiae, YDL240w) 3e-09 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YOR134w] 4e-09 

( FUNCAT) 06.10 assembly of protein complexes [S. cerevisiae, YORl34w] 4e-09 

(FUNCAT) 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YOR127w) 5e-09 

[ FUNCAT 1 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPLllSc] 3e-08 

[FUNCAT] 10.02.09 regulation off g-protein activity [S. cerevisiae, YPLllSc) 3e-08 

[BLOCKS] BL00479B Phorbol esters / diacylglycerol binding domain proteins 

[BLOCKS] BL00479A Phorbol esters / diacylglycerol binding domain proteins 

[SCOP] dlpbwa 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn le-55 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) le-49 

[PIRKW] breakpoint cluster region le-19 

( PIRKW] transmembrane protein 7e-08 

( PIRKW) brain 3e-22 

[ PIRKW] alternative splicing le-19 

[PIRKWJ P-loop 2e-25 

[SUPFAM] CDC24 homology 3e-22 

(SUPFAM] bcr protein 3e-22 

[SUPFAM] myosin motor domain homology 2e-25 

[SUPFAM] pleckstrin repeat homology 4e-10 

[SUPFAM] LIM metal-binding repeat homology 2e-09 

[SUPFAM] protein kinase C zinc-binding repeat homology 5e-29 

[PROSITE] MYRISTYL 6 

[PROSITE] AMI DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PKOSPHO_SITE 13 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 9 

( PROSITE] ASN_GLYCOSYLATION 1 

[PROSITE] DAG_PE_BINDING_D0MAIN 1 

[PFAM] Phorbol esters / diacylglycerol binding domain 

[KWJ Irregular 

[KWJ 30 

[KWJ LOW_COMPLEXITY 2.22 * 

[KW] COILED_COIL 8.54 % 

SEQ MDTMMLNVRNLFEQLVRRVEILSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG 

COILS CCCCCCCCCCCC 

Irgp- 



SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLIREMLMCDTSGSIQLSEE 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTIDESGSILSDISFDKTDESLDWDSSLVKTFKLKKR 

SEG 

COILS 
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lrgp- 



SEQ EKRRSTSRQFVDGPPGPVKKTRSIGSAVDQGNESIVAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

Irgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVIKPESC 

SEG 

COILS 

lrgp- 

SEQ VPCGKRIKFGKLSLKCRDCRWSHPECRDRCPLPCI PTLIGTPVKIGEGMLADFVSQTSP 

SEG 

COILS 

irgp- 

SEQ MIPSIVVHCVNEIEQRGLTETGLYRISGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS HHHHHHHCCCCCG * GGCCCCHHHHH 

SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDHSIAAMYQAVGELPQANRDTLAFLMIHL 

SEG 

COILS 

Irgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRVAQSPHTKMDVANLAKVFGPTIVAHAVPNPDPVTMLQDIKRQPKVVERLLSLPLEYWS 

SEG 

COILS 

Irgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENIDPLHVIENSNAFSTPQTPDIKVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

Seg xxxxxxxxxxx 

COILS 

irgp- 

SEQ TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 

SEG xxx 

COILS 

Irgp- 



Prosite for DKF2phtes3_lcl . 3 



P500001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 

psooooe 

PS00006 
PS00006 
PS00006 

psooooe 

PS0OO06 
PS00O06 
PS0OO06 
PS00006 

psooooe 

PS00O06 
PS00006 
PS00O06 
PS0OOO7 
PS00O07 
PS00008 
PS00OO8 
PS00008 
PS00008 
PSOOOOS 
PS00008 
PSQOQ09 



144->148 
206->210 
234->238 
270->274 
323->327 
387->39l 
392->396 
410->414 
449->453 
489->493 
579->583 



174->177 
186->189 
245->248 
313->316 
392->395 
435->438 
595->598 
606->609 



2I2->216 
141->145 
182->186 
246->250 



376- >38S 
131-M37 
150->156 
276->282 

377- >383 
388->394 
623->S29 
303->307 



47->5I 
66->70 



46->55 



63->66 



asn_glycosylation 

camp phospho_site 

camp~phospho_site 

camp_phospho_site 

pkc_phospho_site 

pkc phospho_site 

pkc~phospho_site 

pkc phospho_site 

pkc~phospho~site 

pkc_phospho_site 

pkc phospho_site 

pkc~phospho_site 

pkc_phospho_site 

ck2_phospho site 

ck2_phospho~site 

ck2_phospho_site 

ck2_phospho_site 

ck2_phospho_site 

ck2 phospho_site 

ck2~phospho_site 

ck2_phospho site 

ck2 phospho~site 

ck2~phospho~site 

ck2_phospho~site 

ck2_phospho~site 

ck2_phospho site 

tyr_phospho~site 

tyr_phospho~site 

MYRISTYL 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMIDATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00O05 
PDOC00005 
POOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
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PS00479 287->336 DAG_PE_BINDING_DOMAIN PDOC00379 



Pfam for DKFZphtes3_lcl . 3 



H«M_NAME Phorbol esters / diacylglycerol binding domain 

HMM *HrFmrHTFrqPTWCDHCgeFIWGWgKQGYQCQnCgMNCHKRCHelVPmm 

H+F+ +T + P +C CG +1 +GK ++C +C+++ H +C+ + P 
Query 287 HDFVSKTVIKPESCVPCGKRI -KFGKLSLKCRDCRVVSHPECRDRCPLP 334 

HMM C* 
C 

Query 335 C 335 
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DKFZphtes3_lgl3 



group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 1007 amino acid protein with similarity to human 256 kD 
golgin. 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cpl51 shows 
haploid- specif ic transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 



similarity to 256 kD golgi, strong similarity to rat "cpl51" 
21 exons encoded on AC004682 

EST from a testis library, two mouse ESTs of a testis cDNA library, 
rat cplSl shows haploid-specif ic transcription! 
testis or haploid-specif ic transcription 

Sequenced by DKFZ 

Locus: map="16q22.2" 

Insert length: 3405 bp 

Poly A stretch at pos. 3394, polyadenylation signal at pos . 3373 



1 GGGATAGGGG ATGTGGTTTG TTACAAAGGA TGAGTATTTT GATAGCTTCT 
51 CATTCCTTGA ACTATTCTGC AGGTTTATAA CAAAGCTCAG AAAATACTAA 
101 AGGTTAAAGG AGAATTGAGA GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 
151 GAGAGAGACA GAGAAGTGAG CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 
201 ACTTGACATC AAGAATCTGC ACGATGTCTG CAAGAGACAG AGGAAGACCT 
251 TGCAGGACAA TCAGCTCTGC ATGGAGGAGG CAATGAACAG CAGCCACGAC 
301 AAGAAGCAAG CACAGGCATT AGCATTCGAG GAGTCAGAGG TGGAATTTGG 
351 GTCCAGTAAA CAGTGTCATC TGAGACAACT CCAGCAACTG AAGAAAAAAT 
4 01 TGCTGGTCCT TCAACAAGAA CTGGAGTTTC ACACAGAGGA GTTGCAGACT 
451 TCTTACTATT CTCTCCGCCA GTATCAGTCC ATCCTAGAGA AGCAGACTTC 
501 CGACCTGGTT CTTCTGCACC ATCACTGCAA ACTGAAAGAA GATGAGGTGA 
551 TTCTCTATGA GGAGGAAATG GGAAATCACA ACGAGAACAC AGGGGAGAAG 
601 CTCCATTTGG CGCAGGAGCA ACTCGCCTTG GCCGGGGACA AGATCGCCTC 
651 TCTAGAGAGG AGCTTAAACC TCTACAGGGA TAAATACCAG TCTTCCCTGA 
701 GCAACATCGA GTTACTAGAA TGCCAAGTGA AGATGTTGCA GGGGGAACTC 
751 GGCGGGATCA TGGGTCAGGA GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 
801 ACGGATATAC ACTTCTCCTT GCATGATTCA AGAGCATCAG GAGACTCAGA 
851 AACGACTGTC TGAAGTCTGG CAAAAGGTCT CTCAACAGGA TGATCTCATT 
901 CAAGAACTTC GAAATAAGCT GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 
951 TGAAAAGGCT TTGATAAAAC TACAAGCCGA TTTTGCTTCC TGTACAGCCA 
1001 CCCACAGATA CCCTCCTAGC TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 
1051 ATACTGAAGC ACTTGCAGGA GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 
1101 GGAGTACCAG AACCTGGTGA AGGATCTGCG CGTGGAACTA GAGGCCGTGT 
1151 CGGAACAGAA GAGAAACATC ATGAAGGACA TGATGAAGCT GGAGCTGGAC 
1201 CTGCACGGAC TGCGGGAGGA GACATCTGCC CACATTGAGA GGAAGGATAA 
1251 GGACATCACC ATCCTGCAGT GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 
1301 CCGAGACCCA AAAGCTCACT TTGAAGAAAG ACAAGTTCCT CCAAGAGAAA 
1351 GATGAGATGC TGCAAGAGCT GGAGAAGAAA CTGACACAGG TTCAGAACAG 
1401 CCTCCTGAAA AAGGAGAAGG AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 
1451 AACTTGAAAT GACAGTCAAG GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 
1501 GAGTGCAAGG CCCTGCAGGC TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 
1551 AGAGGCCAAG CAGCAGGAGA GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 
1601 AAGAAGAGGC TGCACTGGCA GGCTGTCACC TGGAGGACAC CCAGAGGAAA 
1651 CTGCAGAAGG GTCTCCTCCT GGACAAGCAG AAGGCAGACA CCATCCAGGA 
1701 ACTACAGAGA GAACTTCAGA TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 
1751 AGGAACAAAC CTCCAACAGA AAACGGGTGG AGGAGCTGTC ATTAGAACTC 
1801 TCTGAAGCCC TGAGGAAGCT TGAAAATTCA GACAAGGAAA AGAGGCAGCT 
1851 TCAGAAGACA GTGGCTGAGC AGGATATGAA AATGAATGAC ATGCTTGATC 
1901 GTATCAAGCA CCAGCACAGC GAGCAAGGCT CCATCAAATG CAAGTTAGAA 
1951 GAAGATCTTC AGGAGGCCAC AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 
2001 GAAGAAGAGC AAAGAGCATG AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 
2051 TGCGGCAGGA ATTTAAAAAG AAAGACAAGA CGTTGAAAGA GAATTCCAGA 
2101 AAGTTGGAGG AAGAAAATGA GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 
2151 TACACAACTG GAATCCTCTC TCAACAAATA CAACACCAGC CAGCAAGTCA 
2201 TCCAAGACTT GAATAAAGAG ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 
2251 CTGCAGGCCC AGCTGGACAA AGCTCTGCAG AAGGAGAAGG ACTATCTCCA 
2301 GACTACCATC ACCAAAGAAG CCTATGATGC ATTATCCCGG AAGTCAGCCG 
2351 CCTGCCAGGA TGACCTGACA CAAGCCCTCG AGAAGCTCAA TCACGTGACC 
2401 TCAGAGACAA AGAGCCTGCA GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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2451 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 

2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 

2551 GAGGTGCACG CCTTTGACAA GAAGCTAGAG GAGATGAGCT GCCAGGTGCT 

2601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 

2651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 

2701 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 

2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 

2801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 

2851 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 

2901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 

2951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 

3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 

3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 

3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCCCGGG TTCCTCATAC 

3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 

3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 

3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 

3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 

3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 

3401 AAAAA 



BLAST Results 



Entry AC004 6B2 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAC clone CIT987SK-A-259H10, complete 
sequence . 

Score » 1291, P - 0.0e+00, identities - 265/272 



Medline entries 



No Medline entry 

Peptide information for frame 1 



ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known ' protein 

Prosite motifs: LEUCINE_ZIPPER (B3-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE ZIPPER (97-119) 

LEUCINE~ZIPPER (104-126) 

LEUCINE~ZIPPER (403-425) 

LEUCINE~ZIPPER (410-432) 

LEUCINE ZIPPER (918-940) 



1 MKDEAGERDR EVSSLNSKLL SLQLDIKNLH DVCKRQRKTL QDNQLCMEEA 

51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLQQLKKKL LVLQQELEFH 

101 TEELQTSYYS LRQYQSILEK QTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 

151 ENTGEKLHLA QEQLALAGDK IASLERSLNL YRDKYQSSLS NIELLECQVK 

201 MLQGELGGIM GQEPENKGDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 

251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 

301 CEDIKKILKH LQEQKDSQCL HVEEYQNLVK DLRVELEAVS EQKRNIMKDM 

351 MKLELDLHGL REETSAHIER KDKDITILQC RLQELQLEFT ETQKLTLKKD 

401 KFLQEKOEML QELEKKLTQV QNSLLKKEKE LEKQQCMATE LEMTVKEAKQ 

451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 

501 EDTQRKLQKG LLLDKQKADT IQELQRELQM LQKESSMAEK EQTSNRKRVE 

551 ELSLELSEAL RKLENSDKEK RQLQKTVAEQ DMKMNDMLDR IKHQHREQGS 

601 IKCKLEEOLQ EATKLLEDKR EQLKKSKEHE KLHEGELEAL RQEFKKKDKT 

651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSQQVI QDLNKEIALQ 

701 KESLMSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 

751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI IAYEERMKKL NTELRKLRGF 

801 HQESELEVHA FDKKLEEMSC QVLQWQKQHQ NDLKMLAAKE EQLREFQEEM 

851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 

901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQENKKLK KEIEEKKMKA 

951 ENTRLCTKAL GPSRTESTQR EKVCGTLGWK GLPQDMGQRM DLTKYIGMPH 

1001 CPGSSYC 



BLAST P hits 



Entry HS417401_1 from database TREMBL : 

product: "trans-Golgi p230"; Human trans-Golgi p230 mRNA, complete 
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CQS . 

Score - 411, P - 3.9e-34, identities - 212/862, positives - 420/862 
Entry SCINTANA_1 from database TREMBL : 

Saccharomycea cerevisiae integrin analogue gene, complete cds . 
Score - 404, P - 6.2e-34, identities = 199/897, positives - 423/897 

Entry HS6802_2 from database TREMBL: 

gene: "MYH9" ; product: "dJ6802.2"; Homo sapiens DNA sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS . 

Score - 404, P - 1.9e-33, identities - 231/1028, positives - 469/1028 
Entry AF092090_1 from database TREMBL: 

product: "cplSl"; Rattus norvegicus cplSl rnRNA, partial cds. 

Score - 2523, P - 3.0e-262, identities - 506/733, positives - 611/733 



Alert BLAST P hits for DKFZphtes3_lgl3, frame 1 

TREMBL: HSGOLGIN 1 product: "256 kD golgin"; H. sapiens rnRNA for golgin, 
N - 1, Score - 411, P - 4.4e-34 

TREMBL :HS417401_1 product: "trans-Golgi p230"; Human trans-Golgi p230 

rnRNA, complete cds., N « 1, Score - 411, P - 4.5e-34 

TREMBL: SCINTANA_1 Saccharomyces cerevisiae integrin analogue gene, 
complete cds., N - 1, Score - 404, P - 7.1e-34 



>TREMBL:HSGOLGIN_l product: "256 kD golgin"; H . sapiens mRNA for golgin 
Length - 2, 185 

HSPs: 

Score - 411 (61.7 bits), Expect - 4.4e-34, P « 4.4e-34 
Identities - 212/816 (25%), Positives - 420/816 151V 

Query: 145 EMGNHNEN-TGEKLHLAQEQLALAGDK1ASLERSLNLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL+ YR KY ++ ++L+ + K LQ 

Sbjct: 119 DMDSEAEDLVGNSDSLNKEQLI QRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQ 175 

Query: 204 GELGGIMGQEPENKGDHSKVRI YTSPCMIQEHQETQKRLSEVWQ-KVSQQDDLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 176 G ILSQSQ DKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQ 227 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSSEEC-ED--IKKILKHLQE 313 

++ + + ++ K L *L+ A P S E ED K L+ LQ+ 

Sbjct: 220 VS LLKQRLRNGPMNV DVLKPL PQLE PQ- AEVFTKEEN PE50GE PVVEDGT S VKTLET LQQ 286 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ Q C ++ ++ L E EA+ EQ ++++ K++ DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIK-DLH-MAEKTKL 34 4 

Query: 367 HIERK DKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV — QNSL 424 

+ +D I Q Q+ + ET++ + + L+ K+E + +L ++ Q+ Q 
Sbjct: 345 ITQLRDAKNLIEQLE-QDKGMVIAETKR QMHETLEMKEEEI AQLRSRI KQMTTQGEE 400 

Query: 425 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 484 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 
Sbjct: 401 LREQKE-KSERAAFEELEKALSTAQ — KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 456 

Query: 485 QA-AQCKEEAA-LAGCHLEOTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL ++LQ ++E E+ + 

Sbjct: 457 QELSRVKQEVVDVMKKSSEEQIAKLQK--LHEKELARKEQELTKKLQTRERE--FQEQMK 512 

Query: 54 3 TSNRKRVEELSLELSEALRKLENSDKEKRQLQKT— VAEQDMKMNDMLDRIKHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + E + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILE 571 

Query: 601 IKCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG ELEALR-QEFKKKDKTL 651 

++ LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQEHKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE IALQKESLMS 706 

+ ++ + E E LR + C + E+ L +K Q I+++N++ + +++ L S 
Sbjct: 632 QVLKQQVQTEMEKLREK CEQEKETLLKDKEI1FQAHIEEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT--TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQ 7 64 
L ++L + L K +H L+ ++ K+ D + ++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKHKQELEAKMDE—QKNHHQQQVDSIIKEHEV 745 

Query: 765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

S+ +T+ KA L+++I E +K+ + L++ + + E ++ + +L++ S ++ 
Sbjct: 746 SIQRTE— KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ + E h + + KD C 

Sbjct: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTE 855 

Query: 87 9 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNyiAKLS-GEKDHLHSVMVHLQQENK 937 

L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + +++EN 

Sbjct: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT— QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 
Sbjct: 913 I LQMREGQKKE I E I LTQKLS AK 934 

Score - 338 (50.7 bits), Expect - 3.1e-26, P - 3.1e-26 
Identities - 216/953 (22%), Positives - 468/953 (49%) 

Query: 2 KDEAGERDRE--VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EEAM 51 

K+E E D E V S K L +LQ +K ++ KR ++T+Q + + C +EA+ 
Sbjct: 260 KEENPESDGEPWEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 319 

Query: 52 NSSHDKKQAQALAFEESEVErGSSKQCHLRQ LQQLK--KKLLVLQQELEFHTEELQ 105 

D++ + ++ + + LR ++QL+ K +++ + + + H E L+ 

Sbjct: 320 QEQLDERLQELEKIKDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMH-ETLE 378 

Query: 106 TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 164 

+ Q +S +++ T+ L K K + EE +T +K A+ +L 

Sbjct: 379 MKEEEIAQLRSRIKQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 434 

Query: 165 ALAGDKIASLERSLNLYRDKYQSSLSNI—ELLECQVKMLQGELGGIMGQEPENKGDHSK 222 

A ++I ++E++ R Q LS + E+++ K + ++ + Q+ K K 
Sbjct: 435 AEMOEQIKTIEKTSEEERISLQQELSRVKQEWDVMKKSSEEQIAKL--QKLHEKELARK 492 

Query: 223 VRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQA 282 

+ T +E +E Q+++ +K SQ + L ++ + +L LE ++LQ 
Sbjct: 493 EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL— KISQEKEQQESLALEE LELQK 544 

Query: 283 DFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 341 

AT + +EE + + L++ ++E +N KDL V LEA ++ 

Sbjct: 545 K-AILTESENKLRDLQQEAETYRTRILELESSLEKS— LQENKNQSKDLAVHLEAEKNK 600 

Query: 342 QKRNIMKOMMKIiELDLHGLREETSAHIERKDKOITI-LQCRLQELQLEFTETQKLTLKKD 400 

+1 +K++LL++A K++ Q +++L+ E E +K TL KD 
Sbjct: 601 HNKEITVMVEKHKTE1.ESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLLKD 659 

Query: 401 K FLQEKDEM-LQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKS 453 

K ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 

SbjCt: 660 KEIIFQAHIEEMNEKTLEKLDVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 717 

Query: 454 K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC-KEEAALAGCHLEDTQRKLQKGL 511 

K E E K + + + ++ ++ ++ Q+ + K++ L++ + L++ 

SbjCt: 713 KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQ 776 

Query: 512 L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 570 

++ +AO 1+ + ELQ + + + Q++ ++ + +KL + + E+ 

SbjCt: 777 AHVENLEAD-IKRSEGELQQASAKIiDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 835 

" Query: 571 RQLQKTVAEQDMKMNDM LD—RIKHQHREQGSIK— CKLEEDLQEATKLLEDKREQL 623 

L K VAE + + 0+ LD +1+ Q Q K ++E+ + + T++ EKE 
Sbjct: 836 ILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESKLEDG 895 

Query: 624 KKSKEHEK— LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 681 

K +E K L+E E L+ +K K ++ ++KL + +++ + T+ ++ 
SbjCt: 696 NKEQEQTKQILVEKENMILQMREGQK-KEIEILTQKLSAKEDSIHILNEEYETKFKNQEK 954 

Query: 682 KYNTSQQV1QDLNKEIALQKESLHSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 741 

K +Q +++ + + K+ L+ +A+L K L E L+ + ++ ++A + A 
Sbjct: 955 KMEKVKQKAKEMQETL- — KKKLLDQEAKLKKEL-tENTALELSQKEKQFNAKMLEMAQA 1009 

Query: 742 QD-DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGF 800 

++ A+ +L T++ + ++ SLT+ + +L + I +E KKLN + +L+ 
SbjCt: 1010 NSAGISOAVSRLE--TNQKEQIE-SI.TEVHRR--ELNDVISIWE-— KKLNQQAEELQEI 1061 

Query: 801 HQESELEVHAFDKKLEEMSCQVLQW—QKQHQNDLKMLAAKEEQLREFQEEMAALKENLL 858 

H E+++ ++++ E+ ++L + +K+ N ++ KEE +++ + L+E L 
SbjCt: 1062 H EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 1116 
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Query: 859 EDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-- WAKQQKVANEKLGNQLREQVNYI- 915 

+ L Q K L + + +L++ + ++Q V + L + + +V+ + 

Sbjct: 1117 QKSAHVNSLAQ-DETKLKAHLEKLEVOLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 1175 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 
SbjCt: 1176 SKLKTTDEEFQ5LKS5HEKSNKSLEDKSLEFKKLSEE 1212 

Score = 337 (50.6 bits). Expect - 4.0e-26, P - 4.0e-26 
Identities - 215/951 (22%), Positives - 433/951 (45%) 

Query: 10 RE VS S LN S KLLSLQLD I KN LH DVC KRQRKT LQDNQLCMEEAMN S SHDKKQAQALA FEES E 69 

+E + +++L L+++ KQKL+ EA + H+K+ + E+ + 

Sbjct: 560 QEAETYRTRILELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT— VMVEKHK 613 

Query: 70 VEFGSSKC£HLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

Sbjct: 614 TELESLK--H-QQDALWTEKLQVLKQQYQTEMEKLREK— CEQEKETLLKD-KEI IFQA 666 

Query: 130 HHCKLKE DEVILYEEEMGNHNENTGEKL— HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 
SbjCt: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSL5NIELLECQVKMLQGE--LGGIMGQEPENKGDHSKVRIYTSPCMIQEHQE 237 

+ +Q++I+E+V++EL +Q +K+ +++ 

SbjCt: 727 EQKNHHQQQVDSI-IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
SbjCt: 785 D I KRS EGELQQAS AKLDV FQS YQS AT H EQT KA YEEQLAQLQQKLLDLE -T ERI L 837 

Query: 298 SEECEDIKKILKHLQEQKDSQCLHVEEYQNiiVKDLRVELEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE + + +K + ++ E 
Sbjct: 838 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITILQCRL-QELQLEFTETQKLTLKKDKF— LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+ ++E TQKL+ K+D L E+ E + 

SbjCt; 892 LEDGNKEQEQTKQILVEKENMILQMREGQKKEIEIL-TQKLSAKEDSIHILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLZ.K KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 466 

EKK+ +V+ + K+K L+ + + f ELE T E Q K K+ K L+ Q 

SbjCt: 951 NQEKKHEKVKQKAKEHQETLKKKLt*DQEAKLKKELENTALELSQ-KEKQFNAKMliEH-AQ 1008 

Query: 467 KLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQR 526 

+ +A RLQQ+ + L D +KL Q+A+ +QE+ 

SbjCt: 1009 ANSAGISDAVS — RLETNQKEQIESLTEVHRRELNDVISIWEKKL— NQQAEELQEIH- 1062 

Query: 527 ELQMLQKE SSMAEK EQT SNRKRV— EELSLELSEALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

SbjCt: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L++++ + L+E L E LE+++++ K+ 

SbjCt: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E ++L+ +K +K+L++ S + ++ +E L +L C + E+ L T++ + + 
Sbjct: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q + K KE ++T E +A R+ Q+ L - QA 
Sbjct: 1242 KTNAILSR-ISHCQHRTTKV— KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNH VT S ETKS LQQS LTQTQEKKAQLE EE 1 2 AY EERHKKLN TELRK--LRGFHQESE 805 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

Sbjct: 1298 QLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAvTL 1357 

Query: 806 LEVHAFDKKLE—EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 863 

++ +KK+E +S Q+ Q QN + L+ KE + +++ K LL D + 

Sbjct: 1358 MKEELKEKKVEISSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE QVNYIAKLSG 920 

++ K+ D +W K+ + + N ++E Q+ +K + 

Sbjct: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EKDH-LHSVMVHLQQENKK LKKEIEEKKMKAE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQINLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score - 332 (49.8 bits). Expect » 1.4e-25, P » 1.4e-2S 
Identities - 209/953 (21%), Positives - 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + R+R+ + ++ +E++ + S + 

Sbjct: 470 MKKSSEEQIAKLQKLHEKELARK-EQELTKKLQTREREFQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ I + + KL LQQE E + + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAILTESEN— KLROLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE--VILYEE EMGHHNENT— GEKLHLAQEQLALA 167 

+ Q+ DL + K K ++ ++ E+ E H ++ EKL + ++Q 

Sbjct: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDK1ASL— ERSLNLYRDK— YQSSLS— NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

+K+ + L +DK +Q+ + N + LE ++ + Q EL + + E 

Sbjct: 642 MEKLREKCEQBKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRIYTSPCMIQEHQETQKRLSEVWQKVSMDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ S ++++ +T K E+ K+ +Q + Q+ + + + + ++R + +K 
Sbjct: 701 HKLEEELS — V L K D- - QT DKMKQELEAKMDEQKNHHQQQV DSI I K EH EVS I QRT EKALKD 756 

Query: 291 QADFASCTATHR--YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+++ +K + + ++ +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKA 816 

Query: 339 VSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKLTLK 398 

EQ + + ++ LE + L ++ A +E + KD+ C EL +QL + 

Sbjct: ■ 817 YEEQLAQLQQKLLDLETERILLTKQV-AEVEAQKKDV CT--ELDAHKIQVQDLMQQ 869 

Query: 399 KDKFLQEKDEWLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 457 

+ K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 
Sbjct: 870 LEK QNSEMEQKV-KSLTQVYESKLEDGNKEQEQTKQILVEKENMILQMREGQKKEIE 925 

Query: 458 C— KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK--LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

Sbjct: 926 ILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K HA+ V L E + L ++ +R+ 

Sbjct: 982 KKELEHTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL--TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKS KE 628 

L ++ + K+N + ++ H Q K + +L++ L ++E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQBIHEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 688 

+ L L+++ K+K + NS L ++ L+A L+ L SL + Q+ 

Sbjct: 1100 EGVKQDTTLNELQEQLKQKSAHV--NS--LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 VIQDLNKEIALQKESLMSLQAQL— DKALQ— KEKHYLQTTITKEA— YDALSRKSAA 740 

+ +L K + L >+L D+ Q K H ++ + LS + A 

Sbjct: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNK5LEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKL 7 90 

Q D+ T+AL E +N +S+T ++ ++ Q + +++E ++ + +L 

Sbjct: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+LR+L + +LEE Q+ K + D++ L ++E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNI SFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

Sbjct: 1328 G — NQQQAAS EKESC - 1 TQ — LKKELSE NIKAVTLMKEELKEKKVEISSLSKQLTD 1378 

Query: 911 QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ LS ++ + S+ +E +L ++++ K + 

Sbjct: 1379 LNVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVD 1422 

Score - 329 (49.4 bits). Expect » 2.9e-25, P - 2.9e-25 
Identities - 226/941 (24%), Positives - 444/941 (47%) 

Query: 61 QALAFEESEVE--FGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

Q L E+ +++ S+ LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENT — -GEKL— HLAQEQLALA 167 

+ QSL + + D+ ++E+ EN GE+ + + L 

SbjCt: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYT 227 
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+ + EL ++ QS LL + + LQ +L + QE E D + + 

Sbjct: 285 QORVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERL-QELEKIKD--- LHMAE 340 

Query: 228 SPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASC 287 

+1 +++++++ Q +1 E + ++ L ++ E+ + +L++ 

Sbjct: 341 KTKLITQLRDAKNLIEQLEQDKGM VIAETKRQM--HETLEMKEEE-IAQLRSRIXQM 394 

Query: 288 TATH RYPPSSSEEC— -EDIKKILKHLQEQKDSQCLHVEEYQNLVKDL RVE 335 

T R SE E+++K L Q+ ++++ E +K + R+ 

Sbjct: 395 TTQGEELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERIS 454 

Query: 336 LEA-VSEQKRNIMKDMMKL--ELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTET 392 

L+ +S K+ ++ D+MK E + L++ + RK++++T +LQ + EF E 
Sbjct: 455 LQQELSRVKQEVV-DVMKKSSEEQIAKLQKLHEKELARKEQELTK KLQTREREFQEQ 510 

Query: 393 QKLTLKKDKrLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 452 

K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 

Sbjct: 511 MKVALEKSQ--SEYLKISQEKEQ QESLALEELELQKKAI L-TESENKLRDLQQE- 561 

Query: 453 SKEAECKALQAEVQKLKNSLEEAKQQER LAAQQAAQCKEEAALAGCHLEDTQR-K 506 

++ + L+ E L+ SL+E K Q + L A++ KE + H + + K 

Sbjct: 562 AETYRTRILELE-SSLEKSLQENKNQSKDLAVHLEAEKNKHNKE1TVMVEKHKTELESLK 620 

Query: 507 LQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRK-LEN 565 

Q+ L ++ Q+ 0 F++ L +E EKE K + + E K LE 

Sbjct: 621 HQQDALWTEKWJVLKQQYQTEMEKL-REKCEOEKETLLKDKEII-FQAHIEEMNEKTLEK 67B 

Query: 566 SDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGSI-KCKLEEDLQEA-TKLLEDKR--E 621 

D ++ +L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 
Sbjct: 679 LDVKQTELESLSSE LSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQ 733 

Query: 622 QLKKS--KEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEEN ENLRAELQCCSTQL 67 6 

Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ +L 

Sbjct: 734 QQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGEL 793 

Query: 677 ESSLNKYNTSQQVIQDLNKEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSR 736 

+ + K + Q +++ +E L LQ +L L+ E+ L TK+ + ++ 
Sbjct: 794 QQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKL-LDLETERILL TKQVAEVEAQ 848 

Query: ' 737 KSAACQD DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ--LEEEIIAYEE 785 

K C + DL Q LEK N SE + +SLTQ E K + +E+ + 
Sbjct: 849 KKDVCTELDAHKIQVQDLMQQLEKQN SEMEQKVKSLTQVYESKLEDGNKEQEQTKQI 905 

Query: 786 RMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVL--QWQKQHQNDLKMLAAKEEQL 843 

++K N L+ G Q+ E+E+ +E S +L +++ + +H K + +++ 

SbjCt: 906 LVEKENMILQMREG--QKKEIEILTQKLSAKEDSIHILHEEYETKFKNQEKKMEKVKQKA 963 

Query: 84 4 REFQEEHAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 899 

+E QE LK+ LL+ ++ L++ L+ Q ++A+ 

Sbjct: 964 KEMQE TLKKKLLDQEAK LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 900 ANEKLGNQLREQVNYIAKLSG-EKDHLHSVMVH-LQQENKKLKK--EIEEKKMKAENTRL 955 

A +L +EQ+ + ++ E + + S+ L Q+ ++L++ EI+ ++ + E L 
SbjCt: 1017 AVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAEL 1076 

Query: 956 CTKALGPSRTESTQREKVCGTLGWKGLPQD 985 

K L E + K L +G+ QD 

Sbjct: 1077 KQKIL-LFGCEKEEMNKEITWLKEEGVKQD 1105 

Score - 326 {48.9 bits), Expect - 6.0e-2S, P - 6.0e-25 
Identities - 220/907 (24%), Positives - 444/907 (48%) 

Query: 67 ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILE KQTS 123 

E+E G+S + QL Q +++ EL T+Y L++ + L+ Q+ 

Sbjct: 123 EAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 182 

Query: 124 DLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNLYRD 183 

D L +L+Et + +++ H + E+ + E+ 1+ L+ ++L + 

SbjCt: 183 DKSL-RRI AELREE--LQMDQQAKKHLQ EEFDASLEE KDQYISVLQTQVSLLKQ 233 

Query: 184 KYQSSLSNIELLECQVKMLQGELGGIMGQE-PENKG DHSKVR- 1 YTS PCM IQEHQ 236 

+ ++ K+++L+ + L+ ♦ +E PE+ G D + V+ + T ++ + 

SbjCt: 234 RLRNGPMNVDVLK-PLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQE 292 

Query: 237 ETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPS 296 

KR E Q +Q L+ K A h ER + L K++ D T 

Sbjct: 293 NLLKRCKETIQSHKEQCTLLTS--EKEALQEQLO-ERLQELEKIK-DLHMAEKTKLIT— 34 6 

Query: 297 SSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELD 356 

+ D K +++ L++ K +E+++L++E++QR++KM + 
SbjCt: 347 QLRDAKN L I EQL EQDKGM - - V I AET KRQMH ET LEMKEEE I A- QLRS RI KQHTTQG E E 400 
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Query: 357 LHGLREETS-AHIERKDKDITILQC RLQE LQLEFTETQKLTLKKDKFLQEKDEMLQ 411 

L +E++ A E +K ++ Q + +E L+ E E K T++K +E+ + Q 
Sbjct: 401 LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TIEKTSE-EERISLQQ 45? 

Query: 412 ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 470 

EL + +V + + K E+++ K 0 + E E+ KE Q+ +K+ + + + + Q +K 
Sbjct: 458 ELSRVKQEVVDVMKKSSEEQIAKLQKLH-EKELARKE--QELTKKLQTREREFQEQ-MKV 513 

Query: 471 S LEEAKQQERLAAQQAAQC KEEAALAGC H LEDTQRKLQ- KGLL LD- KQKADT I QELQRE L 528 

+LE++ QEL Q++EAL L+ ++LD +Q+A+T + EL 

Sbjct: 514 ALEKS-QSEYLKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILEL 572 

Query: 529 QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNOM 587 

+ ES+E+S VLE++ +++ +K K +L+ +QO + 
Sbjct: 57 3 ES-SLEKSLQENKNQSKDLAVH-LBAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 630 

Query: 588 LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE-- QLKKSKEHEKLMEGELEALRQEF 64 4 

L +K Q+ + E ++ K E QE LL+DK Q + +EK +E +L+ + E 
Sbjct: 631 LQVLKQQYQTEMEKLREKCE QEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTEL 686 

Query: 64 5 KKKDKTLKE--NSR-KLEEENENLRAELQCCSTQLESSLNKY-MTSQQVIQDLNKE--IA 698 

+ L E +R KLEEE L+ + +LE+ +++ N QQ + + KE ++ 

Sbjct: 687 ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI IKEHEVS 746 

Query: 699 LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 749 

+Q+ E + Q QL+ L +++KH + E +A ++S A+ + D+ Q+ 

Sbjct: 747 IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 606 

Query: 750 EKLN H VTS ET KS LQQS LTQTQEKKAQLEE EI I AY EERMKKLNTELRK LRG FHQES ELEVH 809 

+ H +TK+ H L Q Q+K LE E I +++ ++ + + + +++V 
Sbjct: 807 QSATH--EQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTELDAHKIQVQ 864 

Query: 810 AFDKKLEEMSCQVLQWQKQHQN--DLKMLAAKEEQLREFQEEMAALKENLL EDDKE 863 

++LE+ + ++ Q K + K+ +EQ E +++ KEN++ E K+ 

Sbjct: 8G5 DLHQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ--EQTKQILVEKEHMILQMREGQKK 922 

Query: 864 PC-CLPQ-WSVPKDTCRLYRGNDQIMTNLE-QWAKQQKVANE--KLGNQLREQV-NYIAK 917 

L Q S +D+ + N++ T + Q K +KV + ++ L++ + + + AK 
Sbjct: 923 EIEILTQKLSAKEDSIHIL— NEEYETKFKNQEKKMEKVKQKAKEMQETLKKKLLDQEAK 980 

Query: 918 LSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKALGPSRTESTQREKV 973 

L K L + + L Q+ K+ ++ E M N+ + A+ SR E+ Q+E+ + 
Sbjct: 981 L KKELENTALELSQKEKQFNAKMLE--MAQANSAGISDAV--SRLETNQKEQI 1029 

Score - 318 (47.7 bits), Expect - 4.4e-24, P - 4,4e-24 
Identities - 184/827 (22%), Positives » 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ + ++ L++ ++ + D Q 

Sbjct: 1323 LQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q +++ EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSISLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLKHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIA 172 

E K+ + H +KE ++ L + + ++ E+ ++L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD--EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNIEL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRIYTSPC 230 

L+ + + K + SN+E L+ Q + EL + Q+ E + + ++ Y 
Sbjct: 1497 CLKGEMEDDKSKMEKKESNLETELKSQTARIM-ELEDHITQKTIEIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH+E ++L + ++D+ ++E K+ L LE + +K + + 

Sbjct: 1556 DI-EHKELVQKLQHFQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + L+ + ++ +++++++ +L ♦ E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEK EE 1664 

Query: 350 MMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKL--TLKKDKFLQEKD 407 

K + H E + ++ +++++ IL+ +L+ ++ +ET + + K E++ 
Sbjct: 1665 QYKKGTESH — LSELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEMTVK-EAKQDKSKE 455 

E +Q+ E+K++ +Q +L +KEK L EK++ +++ EM + + + K + 

Sbjct: 1723 EADSQGC VQKT Y E EK I S V LQRN LTEKEK LLQRVGQEKEET VS S H FEMRCQYQERLI K LEH 1782 

Query: 456 AEC KAL — QAEVQKLKN SLEEA KQQE RLAAQQAAQC K — E EAALAGCH LEDTQRKLQKG L 511 
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AE K Q+ + L+ LEE ++ L Q + + + A +LE+ +QK L 
Sbjct: 1783 AEAKQHEDQSH1GHLQEELEEKKKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKTL 1842 

Query: 512 LLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELS — LELSEALRKLENSDKE 569 

++K T Q L+++++ L +S + +++ +R +EEL+ E +AL++++ +K 
Sbjct; 1843 ---QEKELTCQILEQKIKEL--DSCLVRQKEV-HRVEHEELTSKYEKLQALQQMDGRNKP 1896 

Query: 570 KRQLQKTVAEQD MKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 

L++ E+ + +L ++ QH + E + Q+ K + ++ L+ 

Sbjct: 1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDLRML 1956 

Query: 626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNT 685 

KEH++ ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

Sbjct: 1957 RKEHQQ EL.EILKKEYDQ EREEKIKQEQEDL--ELKHNST-LKQLMREFNT 2003 

Query: 686 S-QQVIQDLNKEIALQKESLMS LQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 744 

Q Q+L I ++A+L ++ Q+E + L IEDLR+A ++ 

Sbjct: 2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

Query: 745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEE1IAYEERMK— KLNTELRKLRGFH 801 

+ A E+ +T++ + LQ L + Q+K Q LE+E + + +L T+L + 

Sbjct: 2062 ILDAREE — EMTAKVRDLQTQLEELQKKYQQKLEQEENPGNDNVTIMELQTQLAQKTTLI 2119 

Query: B02 QESELEVHAFDKKLEEMSCQVLQWQK 827 

+S+L+ F +++ + ++ +++K 
Sbjct: 2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 

Score - 316 (47.4 bits), Expect - 7.1e-24, p - 7.1e-24 
Identities - 213/977 (21%), Positives - 454/977 (46%) 

Query: 4 EAGERD-REVSSLNSKLLSLQLD-IKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQ 61 

E R+ +V S+ K L+ Q + >+ +H++ +QK++L++ +++ 
Sbjct: 1034 EVHRRELNDVISIWEKKLHQQAEELQEIHEI-QLQEKEQEVAELKQKILLFGCEKEEMNK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE--LEFHTEELQTSYYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L+ + 

Sbjct: 1093 EITWLKEE GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLKKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENTGEKLHLAQEQLALAGDKI 171 

+ L++Q +L +L K K E+ + +E +++ EK + + E +L K+ 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKI. 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS— NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI YTS 220 

+ L L++ K ++ L EL+ L I +++ K + 

Sbjct: 1210 SEELAIQLDICCKKTEALLEAKTNELINISSSKTNA1LSRI— SHCQHRTTKVKEALLIK 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + S Q L+ +Q+ + Q ++ ++++ A +LV E+E L 
Sbjct: 1268 TCTVSELEAQLRQLTEEQNTLHISFQQATHQLEEKENQIKSMKADIESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQKLVKDLRVELEAVS 340 

Q + + + S EC I ++ K L E ++ L EE +K+ +VE+ ++S 
Sbjct: 1324 QKEGGN QQQAASEKESC — ITQLKKELSENINAVTLMKEE LKEKKVEISSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQEL--QLEFTETQKLT-L 397 

+ q ++ + + l s+ + + D++ L ++Q+L +++ +K++ L 

Sbjct: 1374 KQLTDLNVQLQN-SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKISAL 1432 

Query: 398 KK-DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATEI.EMTV— KEAKQDKS 453 

++ D + + E ++ + + TQ QN++ + + +LE + A E + + KE + + 
Sbjct: 1433 EQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQH 1492 

Query: 454 KEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLE-DTQRKLQKGLL 512 

K +C + E K K +E+ + L +Q A + E + +E ++ ++ K 
Sbjct: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

+ +QK +EL ++LQ Q+ + +++ L ++ +LE KE 

Sbjct: 1552 -NQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLEHQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDR1KHQHREQ-GSIKCKLEEDLQEATKLL. EDKREQLKKSK 627 

+ +V ++ ++ + DR++ + + +K K E+ + K L E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGT 1670 

Query: 628 EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE +++ L+E + +E ++E L A+ T+ E + ++ 

Sbjct: 1671 ESHL SEUJTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQ 1727 

Query: 683 YNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 739 

T ++ I L + + +KE L+ Q tK H+ +E L A 

Sbjct: 1728 GCVQKTYEEKI SVL.QRNLT-EKEKLLQRVGQ-EKEETVSSHFEMRCQYQERLI KLEHAEA 1785 
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Query: 740 ACQDDLTQALEKLNHVTSET-- KSLQQSLTQTQEKKAQLEEEI I AYEERMKKLNTELRKL 797 

+D Q++ + H+ E K+ + SL Q + + + I ++ ++ + + + + K 
SbjCt: 1786 KQHED--QSM--IGHLQEELEEKHKKYSLIVAQHVEKEGGKNNIQAKQNLENVFODVQKT 1841 

Query: 798 RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

SbjCt: 1842 L---QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKYEKLQALQQMDGRNKPT 1897 

Query: 857 -LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E PK + ++ + L A+ ++K +KLG ++ + 
SbjCt: 1898 ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK QKLGKEI VRLQKDL 1953 

Query: 916 AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

SbjCt: 1954 RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNST — LKQLMREFNTQLAQKEQ 2010 

Score - 301 (45.2 bits). Expect - 2.9e-22, P - 2 . 9e-22 
Identities - 221/952 (23%), Positives - 441/952 (46%> 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CKEEAMNSSHD- 56 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 
Sbjct: 1160 LKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 1219 

Query: 57 --KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT-- -EELQTSYY 109 

KK L + +E + SSK L ++ + + +++ L T EL+ 
Sbjct: 1220 CCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLR 1279 

Query: 110 SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE— QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 

Sbjct: 1280 QLT EEQNT LN I S FQQAT KQLEEKEHQI KSMKADI ESLVTEKEALQKEGGNQQQA 1333 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

A +K E + + + +++ + L++ ++K + E+ + Q + V++ 
SbjCt: 1334 ASEK ESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTD LNVQLQ 1384 

Query: 227 TSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFAS 286 

S + ++ + ++ + D +Q+L K+ + L E+ AL +■+ D+++ 
SbjCt: 1385 N S I S LS E KEAA I S S LRKQY DEE KC E LLDQVQDLS FKV DTLSKEKISALEQVD-DWSN 1440 

Query: 287 CTATHRYPPSS — SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+ + S ++ +K++ L E K + +E NL+K+ R + L+ 

SbjCt: 1441 KFSEWKKKAQSRFTQHQNTVKELQIQl-ELKSKEAYEKDEQINLLKEELDQQNKRFDCLK 14 99 

Query: 338 AVSEQKRNIM-KDMMKLELDLHGLRE ETSAHIERKDKDITILQCRLQEL-QLEFTET 392 

E ++ M K LE +L E HI +K +1 L L+ Q + E 

SbjCt: 1500 GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 1559 

Query: 393 QKLTLKKDKFLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD ++£ E+K+ ++N + + ELE ++ + ++VK 
Sbjct: 1560 KELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVK 1616 

Query: 450 QDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQK 509 

SKE E KAL+ ++ S + + +R A Q+ A K++ +E+ + + +K 

SbjCt: 1617 SKEEELKALEDRLES--ESAAKLAELKRKAEQKIAAIKKQLL SQMEEKEEQYKK 1668 

Query: 510 GLLLDKQKADT-IQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 

G + +T +QE +RE+ EQ+ + S+ A+E+D 

SbjCt: 1669 GTESHLSELNTKLQEREREVHILEEKLKSVESSQSETL--IVPRSAKNVAAYTEQEEADS 1726 

Query: 569 E KRQLQK-TVAEQDMKMND-MLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQ 622 

+ K +K +V ++++ + +L R+ Q +E+ ++ E Q +L+ K E 
SbjCt: 1727 QGCVQKTYEEKISVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI--KLEH 1782 

Query: 623 LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ +K+HE +MGLEL++KK +++ KE N++A+ LE 
Sbjct: 1783 AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QNLE 1832 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKAL--QKEKHYLQTTITKEAYDALSR-K 737 

H ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

SbjCt: 1833 NVFDDVQKTLQE--KELTCQ--ILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQ 1888 

Query: 738 SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEI IAYEERMKKLNTEL — 794 

++ T+ LE+ S++ +Q L E + LE ++ E +KL E+ 

SbjCt: 1889 QMDGRNKPTELLEENTEEKSKSHLVQPKLLSNMEAQHHDLEFKLAGAEREKQKLGKEIVR 1948 

Query: 795 --RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAA 852 

+ LR +E + E+ K+ ++ + ++ Q+Q +LK + ++ +REF ++A 
Sbjct: 1949 LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLMREFNTQLAQ 2007 

Query: 853 LKENLLEDDKE PCC LPQWS VP KDTCRLYRGN DQI MTNLEQWAKQQKVAN E KLGNQLREQV 912 
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++ L KE Q V + + Q TN Q K K+A EK + R 
Sbjct: 2O08 KEQELEMTIKETINKAQ-EVEAELLESH QEETN — QLLK--KIA-EKDDDLKRTAK 2057 

Query: 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKElJuEKKMKAEN 952 

Y L ++ + + + LQ + ++L+K+ ++K + EN 
Sbjct: 2058 RYEEILDAREEEMTAKVRDLQ7QLEELQKKYQQKLEQEEN 2097 

Score - 300 (45.0 bits), Expect - 3.7e-22, P » 3.7e-22 
Identities - 195/961 (20%) f Positives - 435/961 (45%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKN — LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 

+KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

Sbjct: 657 LKDKEI I FQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

Query: 59 QAQAUVFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLRQYQSI 117 

+ E E + K H +Q+ + K+ V +Q+ + +++ L++ 
Sbjct: 715 DKMK QELEAKMDEQKNHHQQQVDS I I KEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

Query: 118 LEKQTSDLVLLHHHCKLKEDEVILYEEEMG NHNENTGEKLHLAQEQLALAGDKIASL 174 

L++ + + L K E E+ ++ ++ T E+ +EQLA K+ L 

SbjCt: 772 LKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDL 831 

Query: 175 ERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQ-EPENKGDHSKVRI YTSPCMIQ 233 

E L + + + + ++ + ++ +M Q E +N KV+ T 

Sbjct: 832 ET E RI L LTKQV AEV EAQKKDVCTELDAH K I QVQDLMQQLEKQNSEMEQKVKS LTQ- VYES 890 

Query: 234 EHQETQKRLSEVWQKVSQQDDLIQELRN KLACSNALVLEREKALIKLQADFASCTA 289 

+ ++ K + Q + +++++I ++R ++ + +E ++ L ++ + 
Sbjct: 891 KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDSIHILNEEYET 947 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

++ + E +K+ K +QE + L E L K+L +S++++ 

Sbjct: 948 --KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA--KLKKELENTALELSQKEKQFNAK 1002 

Query: 350 HMKL-ELDLHGLREETSA-HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKD 407 

M+++ + + g+ + s +K++ ++ + +EL + +K ++ + LQE 

Sbjct: 1003 MLEMAQANSAGISDAVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIH 1062 

Query: 408 EM- LQELEKKLTQVQNS LLK KEKELEKQQCHATE LEMTVKEAKQD-KSKEAEC 458 

E+ LQE E+++ +++ +L +++E+ K+ E +T+E++ KKA 

SbjCt: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 459 KALQAEVQKLKNSLEEAKQQERIAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+L + KLK LE+ + + ++ +E+ E+ +RK+ + L K K 

SbjCt: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE — LTSKLKT 1180 

Query: 519 DT I QE LQRELQMLQKES SMAEKEGTS N RKRVEELS LELS EALRKLENS DKE KRQLQKT V A 578 

T +E Q +K + E + +K EEL+++L +K E + K + ♦ 

SbjCt: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN— ELIN 1237 

Query: 579 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + + QL++ E + + 

SbjCt: 1238 ISSSKTNAI LSRI SHCQHRTT KVKEALLIKTCTVSELEAQLRQLTEEQNTLNISF 1292 

Query: 638 EALRQEFKKKD— KTLKENSRKLEEENENLR AELQCCSTQLESSL 680 

+ + ++K+ K++K + L £ E L+ +E + C TQL+ L 

SbjCt: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSEHI 1352 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK +++ K+ YD + 

Sbjct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLMVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 740 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELR-KL 797 

DL+ ++ L+ S + + + E K + + ++ +K+L +L K 

SbjCt: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 79B RGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +++ E +++ ++L++ + + + + ++D + KE L E++A+E 
Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 857 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIA 916 

LED + + T + N+ ++ N Q QK K +L +++ + 
SbjCt: 1530 -LEDH ITQKTIEIESLNE-VLKNYNQ QKDIEHK---ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

+L EKD+ ++ L+ + +K E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score - 298 (44.7 bits), Expect - 6.1e-22, P - 6.1e-22 
Identities - 207/886 (23%), Positives - 412/886 (46%) 



673 



WO 01/12659 



PCT/IB00/01496 



Query: 47 MEEAMNSSHDKKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQT 106 

+ EN++ Q EEE+SK ++L + LQ+E + 

Sbjct: 1281 LTEEQNTLNISFQQATHQLEEKENQIKSMKA DIESLVTEKEALQKEGGNQQQAASE 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ +++ + N + A 

Sbjct: 1337 KESCITQLKKELSEN1NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

I+SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWK-K 144 7 

Query: 227 TS PCM I QEHQETQKRLS EVWQKVSQQDDLIQEL— RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI--TILQCRLQEL 385 

LV+ L+ + + E+K N +K+ + L L A +E K K++ L + +E 

Sbjct: 1562 LVQKLQ-HFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+L+ E + L+ + + E+ ++ E+K+ ++ LL + +E E+Q TE ++ 
Sbjct: 1621 ELKALEDR LESES-AAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGTESHLSE 1676 

Query: 446 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

K + +E E L+ +++ +++S E R A AA + EEA GC + + 

Sbjct: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGL LLO KQKA DT I QELQRE LQMLQKES SMAEKEQT SNRKRVE ELS LELS EALRKL E 564 

K+ +L + + + LQR Q +KE +++ + R + +E ++L A K 
Sbjct: 1736 EKIS---VLQRNLTEKEKLLQRVGQ--EKEETVSSHFEM--RCQYQERLIKLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG--SIKCK--LE EDLQ E 611 

LQ+ + E++ K + ++ +H +E G +1+ K LE +D+Q E 
Sbjct: 1789 EDQSMIGHLQEELEEKNKKYSLIV--AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 184 6 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T ++LE K ++L +K + E+E L +++K + + R +L EEN 

Sbjct: 1847 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMDGRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLH-KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L+KE H + 

Sbjct: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDL-RMLRKE-HQQEl. 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI I AYE 784 

I K+ YD R* Q+- + LE L H +*■ + + + + TQ +K+ +LE I + 
Sbjct: 1965 EILKKEYDQ-EREEKIKQEQ--EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E + K +L HQE E + KK+ E + + K+++ ++L A+EE++ 
Sbjct: 2018 ETINKAQEVEAELLESHQE ETNQLLKKI AEKDDDLKRTAKRYE EILDAREEEMT 2071 

Query: 845 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

+ + EL+++ LQ PD + ++TLQK +++ K 

Sbjct: 2072 AKVRDLQTQLEELQKKYQQK--LEQEENPGNDNVTIM ELQTQLAQ— KTTLISDSK 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ++ + +L + ++++ V HL 
Sbjct: 2124 LKEQE FREQI HNLEDRLKK Y EKNVYATTVGH L 2155 

Score - 280 (42.0 bits). Expect - 5.2e-20, P - 5.2e-20 
Identities - 209/938 (22%) , Positives - 432/938 (461) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKN-LHDVC-KRQRKTLQDNQLCMEEAM-NSSHDKK 58 

++ ++ +E+ +L KLL + +K L + + +K Q N +E A NS+ 
Sbjct: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFNAKMLEMAQANSACISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQOLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

L + E + S + H R+L + + + EELQ + ++ + 

Sbjct: 1017 AVSRLETNQKE-QI ESLTEVHRRELNDV 1 S I WEKKLNQQAEELQ- E IHEIQLQEK- - 1069 

Query: 119 EKQTSDLV--LLHHHCKLKE-DEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKI ASLE 175 

E+ + + + L +L C+ +E ++ I + +E G + T +L +Q + + +A E 
Sbjct: 1070 EQEV A E LKQK I LLFGC EKE EMN K E I TW LKEEGVKQDTTLN ELQEQLKQKS AHVNSLAQDE 1129 

Query: 176 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI--MGQEPENKGDHSKVRIYTSPCMIQ 233 
L ++K+ LN LE LQ +L + + +E + K ++ T+ Q 
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Sbjct: 1130 TKLKAHLEKLEVDL-NKSLKENT-- FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

Query: 234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL—AC--SNALVLEREKAL1KLQADFA 285 

H+++ K L + K + L +EL +L C + AL+ + LI + + 
Sbjct: 1197 SLKSSHEKSNKSLED— KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

Query: 286 SCTATH-RYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 

+++ ++++I + ++Q 4 E QN + + E+K 

Sbjct: 1244 NAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLRQLTEEQNTLNISEQQATHQLEEKE 1303 

Query: 34 5 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 

N +K M K +++ L +E + + + ♦ + +L+ E +E +TLK++ 

Sbjct: 1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

Query: 403 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQ 462 

L+EK + L K+LT + NL+ L +++ +L EK+ ++ L 

Sbjct: 1362 LKEKKVEISSLSKQLTDL-NVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQ--DLS 1418 

Query: 4 63 AEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+V L A +Q + + ++ K++A ++T ++LQ L L ++A 

SbjCt: 1419 FKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRETQHQNTVKELQIQLELKSKEAYEKD 1478 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

SbjCt: 1479 EQINLLKEELDQQNKRFOCLKGEMEDDKSKHEKKESNLET— -ELKSQTARIMELEDHIT 1535 

Query: 579 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEOKREQLKKSKEHEKLMEGEL 637 

++ +++ + + +k+ + +Q 1+ K h + LQ +L E+K ++K+++E +E ++ 
SbjCt: 1536 QKTIEIESLNEVLKN-YNQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQv 1594 

Query: 638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 

+*+• E+KKL+ + +++E L+A L+ +LES S K ++ + ++ 

Sbjct: 1595 YSMKAELETKXKELEHVNLSVKSKEEELKA-LE— DRLESESAAKL— AELKRKAEQK 1647 

Query: 697 IALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 756 

IA K+ L+S Q++ +KE+ Y + T + L+ K + ++ EKL V 

SbjCt: 1648 IAAIKKQLLS— QME— EKEEQYKKGT— ESHLSELNTKLQEREREVHILEEKLKSVE 1699 

Query: 7 57 S ET KSLQQSLTQTQEKKAQLEEEII-AYEERMKKLNTELRKLRGFHQESELEV 808 

S ET +S + T++++A + + YEE++ L L EE + 

SbjCt: 1700 SSQSETLI VPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

Query: 809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 

+ +• EE + + Q+Q L L E + E Q + L+E L E +K+ + 

SbjCt: 1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSLIV 1812 

Query: 869 QWSVPKDTCRLYRGN DQIMTNLEQ-WAKQQKVANEK-LGNQLREQ-VNYIAKLSGEKDHL 925 

V K+ + N Q NLE + QK EK L Q+ EQ + + + + 

SbjCt; 1813 AQHVEKEGGK— NNIQAKQNLENVFDDVQKTLQEKELTCQILEQKIKELDSCLVRQKEV 1869 

Query: 926 HSV-MVHLQQENKKLK 940 

H V M L + +KL+ 
Sbjct: 1870 HRVEMEELTSKYEKLQ 1885 

Score = 227 (34.1 bits), Expect - 2.5e-14, P = 2.5e-14 
Identities - 160/716 (22%), Positives - 318/716 (44%) 

Query: 233 QEHQETQKRLSEVWQKVSQQDDLIQE-LRNKLACSNALV-LEREKALIKL-QADFASCTA 289 

+E +TQ ++ +V + L + ++ L S++ LR+ L + DSTA 
SbjCt: 53 RESGDTQSFAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLOLDSSTA 112 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

+ P E ED+ L +++ QL + + R+ + + + ++ 

SbjCt: 113 SFDPPSDMDSEAEDLVGNSDSLNKEQL1QRLR— RMERSLSSYRGKYSELVTAYQMLQRE 170 

Query: 350 MMKLELDLHGLREETSAH1ERKDKDIT-ILQCRLQELQLEFTETQKLTLKKDKFLQEKOE 408 

KL+ G+ ++ +DK + I + R +ELQ++ + L + D L+EKD+ 

SbjCt: 171 KKKLQ GILSQS QDKSLRRIAELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAE— V 465 

+ L+ +++ ++ L ++ + + +LE + ++++ E++ + + + V 

Sbjct: 220 YISVLQTQVSLLKQRLRNGPMNVDVLKPLP-QLEPQAEVFTKEENPESDGEPVVEDGTSV 278 

Query: 466 QKLKNSLEEAKQQERLA— AQQAAQC-KEEAALAGCHLEDTQRKLQKGLL-LDKQKADTI 521 

+ L+ + K+QE L ++ Q KE+ L E Q +L + L L+K K + 

Sbjct: 279 KTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIKDLHM 338 

Query: 522 QELQRE LQMLQKES SMAEKEQTSN RKRV EEL SL ELS EAL RKLENS DKEKRQLQKT VAEQD 581 

E++L+ ++E++ +E ++EL E +R K+Q 

Sbjct: 339 AEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMHETLEMKEEEI AQLRSRIKQMTTQG 398 
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Query: 


5B2 


Sbjct: 


399 


Query: 


637 


Sbjct: 


459 


Query: 


696 


Sbjct: 


5X3 


Query: 


755 


Sbjct: 


569 


Query: 


813 


Sbjct: 


625 


Query: 


863 


Sbjct: 


682 


Query: 


923 


Sbjct: 


740 


Score 


- 183 


Identities ■ 


Query: 


409 


Sbjct: 


1 


Query: 


468 


Sbjct: 


61 


Query: 


525 


Sbjct: 


121 


Query: 


580 


Sbjct: 


181 


Query: 


633 


Sbjct: 


241 


Query: 


689 


Sbjct: 


301 


Query: 


749 


Sbjct: 


358 


Query: 


808 


Sbjct: 


410 


Query: 


864 


Sbjct: 


466 


Query: 


918 


Sbjct: 


525 



K+ +B KL++ +E 



T ++ Q+ K 
-TREREFQEQMK 512 



+AL+K L+ +K Q+ + + K+A S DL Q E 

-VALEKSQSEYLKISQEKEQQESLALEELELQKKAILTESENKLR---DLQQEAETYRTR 568 

VTSETKS LQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEV--HAFD 812 
+ SL++SL QE K Q ++ + E K N E+ + H+ +ELE H D 

ILELESSLEKSL-— QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHQQD 624 

KKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLRE FQEEMAALKENLLED-DK 862 

TO + + E LE D 



QVL+ +Q+G +++ L K EQ +E 



Q+ K LK +1 + ++ 
-QRTEKALKDQINQLEL 7 63 



(27.5 bits), Expect - 1.3e-09, P - l.3e-09 
132/584 (22*), Positives - 251/584 (42%) 



K ++L++K+++ Q L + 



L+ L Q E + + T +EN 



-DTIQEL 524 
D ++ 



-ELSEALRKLENSDKEKRQLQKTVAE 579 
+ SE + + +EK++LQ +++ 



-DLQEATK LLEDKREQLKKSKEHEKL 632 

L+E + +L+ + LK+ + + 
ASLEEKDQYISVLQTQVSLLKQRLRNGPM 240 

-ENENLRAELQCCSTQLESSLNKYNTSQQ 688 
E+ L+ +++ N + + 



+LQ QLD+ LQ E ++ 



+ + +T E K EEEI R+K++ T+ +LR 

- RQMHETLEMK- - - EEEI AQLRSRI KQMTTQGEELR- 



K+ A +EQ++ + 



Q+ + E 
-EQKEKSE 409 



--EEMAALKENLLEDDKE 863 
EE +L++ L +E 



S++ L+ + K+ EEK 



-EQVNYIAK 917 
Q Y+ K 



--KMKAENTRLCTKALGPSRTESTQREK 972 
+ +AE R L S +S Q K 



Pedant information for DKFZphtes3_lgl3, frame 1 



Report for DKFZphtes3_lgl3.1 



[LENGTH] 

(KW) 

fpl] 



1007 

117480.77 
5.90 
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( HOMOL ] TREMBL: AF092090 1 product: "cplSl"; Rattus norvegicus cpl51 mRNA, partial eds . 
0.0 

[FUNCAT) 30.03 organization of cytoplasm [S. cerevisiae, YDL058wJ 5e-15 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) fS. cerevisiae, YDL058w] 

5e-15 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, Y0R356w] le-11 

[ FUNCAT J 30.04 organization of cytoslceleton [S. cerevisiae, YDR356w] le-11 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YOR356w) le-11 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR095w] le-08 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w) le-08 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR309c] le-08 

[FUNCAT} 1 genome replication, transcription, recombination and repair [M. 

jannaschii, MJ1322] 4e-06 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] 9e-06 

[ FUNCAT } 03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
MYOl - rayosin-l isoformj 3e-04 

( FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

myosin-l isoform] 3e-04 

{ FUNCAT ] 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-l isoform] 3e-04 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJR134C] Se-04 

[EC] 3.6.1.32 Myosin ATPase le-16 

[PIRKW] nucleus 3e-10 

(PIRKW] phosphotransferase 6e-09 

( PIRKW} duplication 2e-06 

(PIRKW J citrulline 2e-12 

{PIRKW} tandem repeat le-16 

[PIRKW] endocytosis 2e-13 

[PIRKW} heart 8e-13 

[PIRKW] transmembrane protein le-13 

[PIRKW] serine/threonine-specific protein kinase 6e-09 

[PIRKW] zinc finger 2e-13 

[PIRKW] metal binding 2e-13 

[PIRKW] DNA binding 4e-12 

(PIRKW) muscle contraction le-16 

[PIRKW] acetylated amino end le-11 

[PIRKW] actin binding le-16 

[PIRKW] mitosis 5e-15 

[PIRKW] microtubule binding 5e-15 

[PIRKW] ATP le-16 

[PIRKW] thick filament le-16 

[PIRKW] phosphoprotein 4e-16 

[PIRKW] skeletal muscle 2e-14 

[PIRKW] calcium binding 2e-12 

[PIRKW] alternative splicing le-16 

[PIRKW] coiled coil le-16 

[pIRKWJ P-loop le-16 

[PIRKW] heptad repeat 3e-10 

[PIRKW) methylated amino acid le-16 

[PIRKW] immunoglobulin receptor 2e-06 

[ PIRKW) peripheral membrane protein 2e-13 

[ PIRKW) cardiac muscle 8e-13 

( PIRKW) hydrolase le-16 

(PIRKW) microtubule 3e-10 

( PIRKW) muscle 8e-13 

( PIRKW) EF hand 2e-12 

(PIRKW) cytoskeleton 2e-15 

[ PIRKW) hair 2e-12 

[PIRKW] calmodulin binding 2e-13 

[ PIRKW] Golgi apparatus 3e-10 

[SUPFAM] myosin heavy chain le-16 

[SUPFAM] conserved hypothetical P115 protein le-07 

[SUPFAM] centromere protein E 5e-l5 

[SUPFAMJ unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 

[SUPFAM] calmodulin repeat homology 2e-12 

[SUPFAMJ myosin motor domain homology le-16 

[SUPFAMJ alpha-actinin actin-binding domain homology 2e-07 

(SUPFAM) plectin 2e-07 

(SUPFAMJ trichohyalin 2e-12 

[SUPFAM] pleckstrin repeat homology 8e-08 

[SUPFAM] ribosomal protein S10 homology 2e-07 

[SUPFAM] giantin 3e-13 

[SUPFAM] protein kinase homology 6e-09 

[SUPFAM] protein kinase C zinc-binding repeat homology 8e-08 

[SUPFAM] kinesin motor domain homology 5e-15 

[SUPFAMJ human early endosome antigen 1 2e-13 

[SUPFAM] M5 protein le-07 

fPROSITE] LEUCINE ZIPPER 7 

(PROSITE) MYRISTYL 2 

(PROSITEJ CAMP_PHOSPHO_SITE 2 

[ PROSITE] CK2_PHOSPH0_SITE 20 
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[PROSITEJ TYR_PH0SPHO_SITE 1 

[PROSITE) PKC_PHOSPHO SITE 16 

[PROSITEl AS N~G LYCOS YLATION 2 

[KW1 All_Alpha 

[KWJ LOW_COMPLEXITY 15.00 % 

(KW] COILED_COIL 42.40 % 

SEQ MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSKDKKQA 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccccccccccccccccccccc 

SEQ QALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELOTSYYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coi ls cccccccccccccccccccccccccccccccc 

SEQ OTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhtihhhhhh 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ REETSAHIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCC CCCCCCCCCCCCCC 

SEQ QNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG ... XXXXXXXXXX xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LAAQQAAQCKEEAALAGCH L EDTQRKLQKG LLLDKQKADT I QELQRELQMLQKES SMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx , 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCC 

SEQ EQTSHRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQDMKMNDHLDRIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

SEG XXXXXXXXXX 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

SEG xxxxxxxxxxxxxxxxxx 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ IAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 

SEQ NEKLGNQLREQVNYI AKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKAL 

SEG XXXXXXXXXXXXXXXXX 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ GPS RTES TQREKVCGT LGWKG L PQDMGQRMDLT K Y IGMPHCPGSSYC 

SEG 

PRD cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 

COILS 



Prosite for DKrzphtes3_lgl3. 1 



PS00001 


52->S6 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


684- 


■>688 


ASN GLYCOSYTATION 


PDOC00001 


PS00004 


240- 


■>244 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


415->419 


CAMP PHOSPHO SITE 


PDOC00004 


PSOOOOS 


74 


l->77 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


110- 


■>U3 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


238- 


•>241 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


290- 


>293 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


392->39S 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


396->399 


PKC~PHOSPHO_SITE 


PDOC00005 


PS00005 


444- 


■>447 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


503- 


■>506 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


544- 


•>547 


PKC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


566- 


•>569 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


600- 


•>603 


PKC~PHOSPHO"~SITE 


PDOC00005 


PS00005 


650- 


•>653 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


655- 


■>658 


PKC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


735- 


>738 


PKC PHOSPHO'SITE 


PDOC00005 


PSOOOOS 


876- 


>879 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


968- 


>971 


PKC PHOSPHO SITE 


PDOC00005 


psooooe 


39->43 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


53 


l->57 


CK2~PHOSPHO SITE 


PDOC00006 


psooooe 


68 


l->72 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


116- 


■>120 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


190- 


>194 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


250- 


>254 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


296- 


>300 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


439- 


>443 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


444- 


>448 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


471- 


>475 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0OO6 


520- 


>524 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


536- 


>540 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


566- 


>570 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


576- 


>580 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


650- 


>654 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


674- 


>678 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


804- 


>80B 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


898- 


>892 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


963- 


>967 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


968- 


■>972 


CK2~PHOSPHO SITE 


PDOC00006 


PS00007 


135- 


>143 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


207- 


>213 


MYRISTYL 


PDOC00008 


PSOOOOS 


599- 


>605 


MYRISTYL 


PDOC00D08 


PS00029 


83- 


>105 


LEUCINE ZIPPER 


PDOC00029 


PS00O29 


90- 


>112 


LEUCINE 2IPPER 


PDOC00029 


PS00029 


97- 


>119 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


104- 


>126 


LEUCINE~ZIPPER 


PDOC00029 


PS00029 


403- 


>425 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


410- 


>432 


LEUCINE ZIPPER 


PDOC00029 


PSO0029 


91B- 


■>940 


LEUCINE~ZIPPER 


PDOC00029 



(Ko Pfam data available for DKFZphtes3_lgl3 . 1) 



DKFZphtes3_lkll 



group: cell structure and motility 
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DKFZphtes3_lkll encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-binding protein (ENC-1) . 

Ectoderm-neural cortex-1 protein (ENC-1 ) is an early and highly specific marker of neural 
induction in vertebrates. The protein is related to the kelch family proteins and is expressed 
during early gastrulation in the prospective neuroectodermal region of the epiblast and later 
in development throughout the nervous system (NS) . ENC-1 functions as an actin-binding protein 
organising the actin cytosJceleton during neural differentiation and development of the NS. 
The novel protein is highly similar to ENC-1. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 



strong similarity to mouse ENC-l 

complete cDNA, compete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3525 bp 

Poly A stretch at pos. 3515, polyadenylation signal at pos. 3499 



1 GGTGGAGAGC CGGCCGACGG GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 

51 GGGCTGCCGG GAGTGGTCTC TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 

101 CGGCACTGGC GCACCATGTC GGTCAGTGTC CATGAGACCC GCAAGTCGCG 

151 GAGCAGCACG GGGTCCATGA ACGTCACCCT CTTCCACAAG GCCTCCCACC 

201 CGGACTGTGT GCTGGCCCAC CTCAACACGC TTCGCAAGCA CTGCATGTTC 

251 ACCGACGTCA CACTCTGGGC GGGCGACCGT GCCTTCCCCT GTCACCGTGC 

301 CGTGCTGGCC GCCTCTAGCC GCTATTTTGA GGCCATGTTC AGCCATGGCC 

351 TTCGGGAGAG CCGGGATGAC ACTGTCAACT TCCAGGACAA CCTGCACCCG 

401 GAGGTGCTGG AGCTGCTGCT GGACTTTGCC TACTCCTCAC GCATCGCCAT 

4 51 CAACGAGGAG AACGCTGAGT CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 

501 TCCACGATGT GCGGGATGCT GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 

551 CCCTCCAACT GCCTGGGCAT GATGCTGCTC TCGGACGCCC ACCAGTGCCG 

601 CCGGCTGTAT GAGTTCTCCT GGCGCATGTG CCTGGTGCAC TTTGAGACGG 

651 TGAGGCAGAG CGAGGACTTC AACAGCCTGT CCAAGGACAC ACTGCTGGAC 

701 CTCATCTCGA GTGATGAGCT GGAGACCGAG GACGAGCGGG TGGTCTTCGA 

751 GGCCATCCTC CAGTGGGTGA AGCACGACCT GGAGCCACGG AAGGTCCACT 

801 TGCCCGAGCT CCTCCGCAGC GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 

B51 CTGCAGGAGG CCGTCTCCAG CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 

901 CAAGCTTATC ATGGATGAGG CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 

951 ATGATGGCGT GGTCACCAGC CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 

1001 ACGCTACTCA TCCTGGGGGG CCAGACCTTC ATGTGTGACA AGATCTACCA 

1051 GGTGGACCAC AAGGCCAAGG AGATCATCCC CAAGGCCGAC CTGCCCAGCC 

1101 CCCGGAAGGA GTTCAGCGCC TCAGCGATCG GCTGCAAGGT CTATGTGACG 

1151 GGGGGCAGGG GCTCCGAGAA CGGGGTCTCC AAGGATGTCT GGGTGTACGA 

1201 CACCGTACAT GAGGAATGGT CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 

1251 TTGGCCATGG CTCAGCTGAG CTGGAGAACT GCCTCTATGT GGTGGGGGGA 

1301 CACACATCCC TGGCAGGGGT CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 

1351 ACAAGTGGAG AAATACGACC CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 

1401 CCTTGCGGGA TGGCGTCAGC AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 

1451 CTCTTTGTTT TCGGAGGAAC CAGCATCCAC CGGGACATGG TGTCCAAGGT 

1501 CCAGTGCTAT GACCCCTCGG AGAACAGGTG GACGATCAAG GCCGAGTGCC 

1551 CCCAGCCTTG GCGGTACACA GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 

1601 ATCATGGGAG GTGACACGGA ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 

1651 CTGTGAGACC AACCAGTGGA CGCGGATTGG GGACATGACT GCCAAGCGCA 

1701 TGTCCTGCCA TGCCCTGGCT TCCGGCAACA AGCTCTATGT GGTCGGGGGC 

1751 TACTTTGGGA CCCAGAGGTG TAAGACTCTG GACTGCTATG ACCCCACTTC 

1801 AGATACATGG AACTGCATCA CCACAGTGCC CTACTCACTT ATCCCCACGG 

1851 CCTTTGTCAG CACCTGGAAG CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 

1901 CCCAGCCAGA CCGCGGCCTT CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 

1951 CACAGCGGGA GCTAAGCCGG CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 

2001 GGCCCTGCCA GCTCTGGGGA GCAGCAGCCT TGGGCTGTTC TGAGCTTTAG 

2051 GCAAGAGAAG AGAAGCATCT CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 

2101 GCTTTGCAGT GGTTTGTGGG AAGACATACC TCCCAGAGGG GCATGGACTG 

2151 CCACCAGGAC TGACCCTGGC GTCGGGGAGA AGGACACTTG CAGAGCCTTG 

2201 AGATCACCTG TTTGGCAGGT CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 

2251 GGAGGCGCCC CGGGTGGGCT TTGGGGCTGC GGCACTCCCA CACATCCTTT 

2301 CCCTCCTGGC CTGCCCTGCT GGGGCTCTAC TGCCATCTAT AGATGGTGTC 

2351 CTGGGCCTGG GAAACTAGGT TCCCAGGGGT TGAGACCAGA AAGGTGACCA 

2401 AGACAGATTT TTTAAGGTGC AGAAACTGCA GGGGGGCCTC AGTGACATCC 

2451 ATGAGGCCTT ATTAGCAAAG GACACCCAGA CCTCCAAGGT TTGTGGGCCC 

2501 CTTCCACAAA GCTGTAAGTC CCAGCCCACC TACTCAGGGC CTTGCTCAGT 

2551 GCTGTGGCCC GGTGGGGACA CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 

2601 GCCTGCAGCA GACTCAAGGC TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 

2651 CCCCTCCTCA GAGCCCACCC TGAGAGGCAG CAGTGACCCC CATGGCACAC 

2701 ACCTGCCAAC AGCACTGGGG GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 
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2751 AAGACCAGGA GCAGCTGTGA 

2801 CCCCTGCAGA TCCCACCAGG 

2851 ACGGGACCCT CACCATCCTC 

2901 CTCCAAGATA AGAATGGCCC 

2951 TGTCACCTCC TGAGTCACTG 

3001 GGGCTATGGA GAGGTTGGCG 

3051 TCACTCCCAA GCCACCATTT 

3101 GACTGGGCAG GGTGTCCAAA 

3151 ACACTGCTTG GCTGGCTCAA 

3201 ACCAGTTACT TAAGCAGCCA 

3251 TGGAGGCCTC TTGGGGGTGG 

3301 GCCCGCCATG GGGCACTGGC 

3351 AGCAGGAGCC TGGCCGCGGG 

3401 GTTGCTTCAT TGAGATAAAG 

3451 CATGGTGCTT TCCCCAAAAG 

3501 ATAAAAAGAG TTGAGAAAAA 



GCTGGAGACA GCAGAGGGAC CCCAGGGTGT 
GCCGCATCCA TCTCAGTGTG GAGGACAGTG 
TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 
CGAGAGAACT GCTGAACATT TGTTCATTGC 
GGGTCCCTCA CCAGCACCTC CCTGACACCT 
CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 
GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 
CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 
GATTAGGGCC GCGGAGGGGG CTGTGCACAT 
CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 
GACCTTTGGG CAGGGTTTGC CCACTGACGC 
TGCATGGGGC TCCTTGGACC CTGTAGAGCC 
GACTGCAGGG AGGGTGCCTG GACCCGTGGG 
CACACTTATC ACATAGCACA AAGGACGTGC 
TTGTGTTGCT TTTATCAGTT TTCTAACTTA 
AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647: 

ENC-1: a novel mammalian kelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/B, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal di f f erentiati 



Peptide information for frame 2 



ORF from 116 bp to 1882 bp; peptide length: 589 
Category: strong similarity to known protein 
Classification: Cell structure/motility 



1 MSVSVHETRK SRSSTGSMNV TLFHKASHPD CVLAHLNTLR KHCMFTDVTL 
51 WAGDRAFPCH RAVLAASSRY FEAMFSHGLR ESRDDTVUFQ DNLHPEVLEL 
101 LLDFAYSSRI AINEENAESL LEAGDMLQFH DVRDAAAEFL EKNLFPSNCL 
151 GMMLLSDAHQ CRRLYEFSWR MCLVHFETVR QSEDFNSLSK DTLLDLISSD 
201 ELETEDERW FEAILQWVKH DLEPRKVHLP ELLRSVRLAL LPSDCLQEAV 
251 SSEALLMAOE RTKLIMDEAL RCKTRILQND GVVTSPCARP RKAGHTLLIL 
301 GGQT FHCDKI YQVDHKAKEI IPKADLPSPR KEFSASAIGC KVYVTGGRGS 
351 ENGVSKDVWV YDTVHEEWSK AAPHLIARFG HGSAELENCL YVVGGHTSLA 
401 GVFPASPSVS LKQVEKYDPG ANKWMMVAPL RDGVSNAAVV SAKLKLFVFG 
4 51 GTSIHRDMVS KVQCYDPSEN RWTIKAECPQ PWRYTAAAVL GSQIFIMGGD 
501 TEFTAASAYR FDCETNQWTR IGDMTAKRMS CHALASGNKL YVVGGYFGTQ 
551 RCKTLDCYDP TSDTWNCITT VPYSLIPTAF VSTWKHLPA 



BLAST P hits 



Entry KMU65079_1 from database TREMBL: 

gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds. 

Score • 2402, P » 1.9e-249, identities - 440/589, positives - 513/589 
Entry AF059611_1 from database TREMBLNEW: 

gene: "NRPB"; product: "nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score - 2400, P - 3.0e-249, identities - 440/589, positives - 512/589 

Entry AF010314_1 from database TREMBL: 

gene: "PIG10"; product: "PiglO"; Homo sapiens PiglO (PIG10) mRNA, 
complete cds. 

Score - 1745, P - 7.8e-160, identities - 335/507, positives - 403/507 
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Entry KELC_DROME from database 5WISSPROT: 

RING CANAL PROTEIN (KELCH PROTEIN), >TREMBL : DMRCPA_1 product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and ORF2 
mRNA, complete cds. 

Score - 672, P - 3.9e-66, identities - 168/536, positives - 257/536 



Alert BLASTP hits for DKF2phtes3_lkll, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lkl 1, frame 2 



Report for DKFZphtes3_lkll .2 



(LENGTH) 


589 


(MWJ 


65923.45 


[pn 


6.10 


( HOMOL ) 


TREMBL:MMU65079 1 gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 


actin-binding protein (ENC-1) mRNA, complete cds. 0.0 


[ FUNCAT] 


10.05.99 other pheromone response activities IS. cerevisiae, YHR158C) 


2e-09 




[BLOCKS J 


BL01016D Glycoprotease family proteins 


[PIRKW) 


zinc finger le-08 


[PIRKW) 


ONA binding le-08 


(PIRKW J 


transcription factor le-08 


(SUPFAMJ 


POZ domain homology 3e-68 


(SUPFAM1 


vaccinia virus 59K Hindlll-C protein le-15 


(SUPFAM] 


A55R protein Se-29 


(SUPFAM] 


hypothetical protein YHR158c 4e-08 


(SUPFAM) 


A55R protein middle region homology 5e-29 


[ SUPFAM] 


myxoma virus M9-R protein le-14 


(SUPFAM) 


A55R protein carboxyl -terminal homology 5e-29 


[KW] 


Alpha_Beta 



SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 

PRD cccccccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RAVLAASSRYFEAMFSHGLRES RDDTVNFQDNLHPEVLELLLDFAYSSRIAINEENAESL 

PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFHDVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 

PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QSEDFNSLSKDTLLDLISSDELETEDERWFEAILQWVKHDLEPRKVHLPELLRSVRLAL 

PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIHDEALRCKTRILQNDGWTSPCARPRKAGHTLLIL 

PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQTFMCDKI YQVDHKAKEI IPKADLPSPRKEFSASAIGCKVYVTGGRGSENGVSKDVWV 

PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPMLIARFGHGSAELENCLYVVGGHTSLAGVFPASPSVSLKQVEKYDPG 

PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

S EQ ANKWMMVA PL RDGVSN AA VV S AKL KL FV FGGTS I H RDMVSKVQCYD P S ENRWT I KAECPQ 

PRD ccceeeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ PWRYTAAAVLGSQI FIMGCDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 

PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLIPTAFVSTWKHLPA 

PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc 



(No Prosite data available for DKFZphtes3_lkll .2) 
(No Pfam data available for DKFZphtes3_lkl 1 . 2) 
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DKFZphtes3_ln3 



group: signal transduction 

DKFZphtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein. 

The protein contains 1 wd-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition- In addition, a RGD site is present. 

The new protein can find application in modulating /bloc Icing G-protein-dependent pathways. 



similarity to Tuplp 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus; /map-"6q24" 
Insert length: 5277 bp 

Poly A stretch at pos . 5267, polyadenylation signal at pos. 5244 



1 GCTGCATAAA GCTGAGAGAT GCCTACAGCT GAGAGTGAAG CAAAAGTAAA 
51 AACCAAAGTT CGCTTTGAAA AATTGCTTAA GACCCACAGT GATCTAATGC 
101 GTGAAAAGAA AAAACTGAAG AAAAAACTTG TCAGGTCTGA AGAAAACATC 
151 TCACCTGACA CTATTAGAAG CAATCTTCAC TATATGAAAG AAACTACAAG 
201 TGATGATCCC GACACTATTA GAAGCAATCT TCCCCATATT AAAGAAACTA 
251 CAAGTGATGA TGTAAGTGCT GCTAACACTA ACAACCTGAA GAAGAGCACG 
301 AGAGTCACTA AAAACAAATT GAGGAACACA CAGTTAGCAA CTGAAAATCC 
351 TAATGGTGAT GCTAGTGTAG AGGAAGACAA ACAAGGAAAG CCAAATAAAA 
401 AGGTGATAAA GACGGTGCCC CAGTTGACTA CACAAGACCT GAAACCGGAA 
4 51 ACTCCTGAGA ATAAGGTTGA TTCTACACAC CAGAAAACAC ATACAAAGCC 
501 ACAGCCAGGC GTTGATCATC AGAAAAGTGA GAAGGCAAAT GAGGGAAGAG 
551 AAGAGACTGA TTTAGAAGAG GATGAAGAAT TGATGCAAGC ATATCAGTGC 
601 CATGTAACTG AAGAAATGGC AAAGGAGATT AAGAGGAAAA TAAGAAAGAA 
651 ACTGAAAGAA CAGTTGACTT ACTTTCCCTC AGATACTTTA TTCCATGATG 
701 ACAAACTAAG CAGTGAAAAA AGGAAAAAGA AAAAGGAAGT TCCAGTCTTC 
751 TCTAAAGCTG AAACAAGTAC ATTGACCATC TCTGGTGACA CAGTTGAAGG 
801 TGAACAAAAG AAAGAATCTT CAGTTAGATC AGTTTCTTCA GATTCTCATC 
851 AAGATGATGA AATAAGCTCA ATGGAACAAA GCACAGAAGA CAGCATGCAA 
901 GATGATACAA AACCTAAACC AAAAAAAACA AAAAAGAAGA CTAAAGCAGT 
951 TGCAGATAAT AATGAAGATG TTGATGGTGA TGGTGTTCAT GAAATAACAA 
1001 GCCGAGATAG CCCGGTTTAT CCCAAATGTT TGCTTGATGA TGACCTTGTC 
1051 TTGGGAGTTT ACATTCACCG AACTGATAGA CTTAAGTCAG ATTTTATGAT 
1101 TTCTCACCCA ATGGTAAAAA TTCATGTGGT TGATGAGCAT ACTGGTCAAT 
1151 ATGTCAAGAA AGATGATAGT GGACGGCCTG TTTCATCTTA CTATGAAAAA 
1201 GAGAATGTGG ATTATATTCT TCCTATTATG ACCCAGCCAT ATGATTTTAA 
1251 ACAGTTAAAA TCAAGACTTC CAGAGTGGGA AGAACAAATT GTATTTAATG 
1301 AAAATTTTCC CTATTTGCTT CGAGGCTCTG ATGAGAGTCC TAAAGTCATC 
1351 CTGTTCTTTG AGATTCTTGA TTTCTTAAGC GTGGATGAAA TTAAGAATAA 
1401 TTCTGAGGTT CAAAACCAAG AATGTGGCTT TCGGAAAATT GCCTGGGCAT 
1451 TTCTTAAGCT TCTGGGAGCC AATGGAAATG CAAACATCAA CTCAAAACTT 
1501 CGCTTGCAGC TATATTACCC ACCTACTAAG CCTCGATCCC CATTAAGTGT 
1551 TGTTGAGGCA TTTGAATGCT GGTCAAAATG TCCAAGAAAT CATTACCCAT 
1601 CAACACTGTA CGTAACTGTA AGAGGACTGA AAGTTCCAGA CTGTATAAAG 
1651 CCATCTTACC GCTCTATGAT GGCTCTTCAG GAGGAAAAAG GTAAACCAGT 
1701 GCATTGTGAA CGTCACCATG AGTCAAGCTC AGTAGACACA GAACCTGGAT 
1751 TAGAAGAGTC AAAGGAAGTA ATAAAGTGGA AACGACTCCC TGGGCAGGCT 
1801 TGCCGTATCC CAAACAAACA CCTCTTCTCA CTAAATGCAG GAGAACGAGG 
1851 ATGTTTTTGT CTTGATTTCT CCCACAATGG AAGAATATTA GCAGCAGCTT 
1901 GTGCCAGCCG GGATGGATAT CCAATTATTT TATATGAAAT TCCTTCTGGA 
1951 CGTTTCATGA GAGAATTGTG TGGCCACCTC AATATCATTT ATGATCTTTC 
2001 CTGGTCAAAA GATGATCACT ACATCCTTAC TTCATCATCT GATGGCACTG 
2051 CCAGGATATG GAAAAATGAA ATAAACAATA CAAATACTTT CAGAGTTTTA 
2101 CCTCATCCTT CTTTTGTTTA CACGGCTAAA TTCCATCCAG CTGTAAGAGA 
2151 GCTAGTAGTT ACAGGATGCT ATGATTCCAT GATACGGATA TGGAAAGTTG 
2201 AGATGAGAGA AGATTCTGCC ATATTGGTCC GACAGTTTGA TGTTCACAAA 
2251 AGTTTTATCA ACTCACTTTG TTTTGATACT GAAGGTCATC ATATGTATTC 
2301 AGGAGATTGT ACAGGGGTGA TTGTTGTTTG GAATACCTAT GTCAAGATTA 
2351 ATGATTTGGA ACATTCAGTG CACCACTGGA CTATAAATAA GGAAATTAAA 
2401 GAAACTGAGT TTAAGGGAAT TCCAATAAGT TATTTGGAGA TTCATCCCAA 
2451 TGGAAAACGT TTGTTAATCC ATACCAAAGA CAGTACTTTG AGAATTATGG 
2501 ATCTCCGGAT ATTAGTAGCA AGGAAGTTTG TAGGAGCAGC AAATTATCGG 
2551 GAGAAGATTC ATAGTACTTT GACTCCATGT GGGACTTTTC TGTTTGCTGG 
2601 AAGTGAGGAT GGTATAGTGT ATGTTTGGAA CCCAGAAACA GGAGAACAAG 
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2651 TAGCCATGTA TTCTGACTTG CCATTCAAGT CACCCATTCG AGACATTTCT 
2701 TATCATCCAT TTGAAAATAT GGTTGCATTC TGTGCATTTG GGCAAAATGA 
2751 GCCAATTCTT CTGTATATTT ACGATTTCCA TGTTGCCCAG CAGGAGGCTG 
2S01 AAATGTTCAA ACGCTACAAT GGAACATTTC CATTACCTGG AATACACCAA 
2851 AGTCAAGATG CCCTATGTAC CTGTCCAAAA CTACCCCATC AAGGCTCTTT 
2901 TCAGATTGAT GAATTTGTCC ACACTGAAAG TTCTTCAACG AAGATGCAGC 
2951 TAGTAAAACA GAGGCTTGAA ACTGTCACAG AGGTGATACG TTCCTGTGCT 
3001 GCAAAAGTCA ACAAAAATCT CTCATTTACT TCACCACCAG CAGTTTCCTC 
3051 ACAACAGTCT AAGTTAAAGC AGTCAAACAT GCTGACCGCT CAAGAGATTC 
3101 TACATCAGTT TGGTTTCACT CAGACCGGGA TTATCAGCAT AGAAAGAAAG 
3151 CCTTGTAACC ATCAGGTAGA TACAGCACCA ACGGTAGTGG CTCTTTATGA 
3201 CTACACAGCG AATCGATCAG ATGAACTAAC CATCCATCGC GGAGACATTA 
3251 TCCGAGTGTT TTTCAAAGAT AATGAAGACT GGTGGTATGG CAGCATAGGA 
3301 AAGGGACAGG AAGGTTATTT TCCAGCTAAT CATGTGGCTA GTGAAACACT 
3351 GTATCAAGAA CTGCCTCCTG AGATAAAGGA GCGATCCCCT CCTTTAAGCC 
3401 CTGAGGAAAA AACTAAAATA GAAAAATCTC CAGCTCCTCA AAAGCAATCA 
34 51 ATCAATAAGA ACAAGTCCCA GGACTTCAGA CTAGGCTCAG AATCTATGAC 
3501 ACATTCTGAA ATGAGAAAAG AACAGAGCCA TGAGGACCAA GGACACATAA 
3551 TGGATACACG GATGAGGAAG AACAAGCAAG CAGGCAGAAA AGTCACTCTA 
3601 ATAGAGTAAA GAATTGAAGA AAAGTTAAGA GCTGCCGAAA TGCACAGAGG 
3651 TGAAAATGAC AAACCAAATG GAATTTCTCT TCAGAGTTCA GAATTTTCAG 
3701 ATACTAAGGA GGAAGAAAGG ATCCACTACT TCTTGTTCTT ATGAATGACT 
3751 CTAGAAAAAT CAGAATCAAG TTGTGGGTGG AAAAATCAAC GTGGCCTTTG 
3801 AGTTCAGTTG TTATAAACCA TTGTGACTAT TGTTGGTCAA AGTATTGGTA 
3851 CTTATATTGT TAGTAATTGC ATCATAATTA CATTACCAGT GTTGGAAAAC 
3901 TAATGAAGAA AACACTGTAA TTGCTACTCA GCAAATGTGA ATAAAAGGTG 
3951 TTTGCGTTAT TAGGATGTCT GTTAAGTAAT CATTTAATAT TATTATATTG 
4001 GTAATGGTTG TATGTGTGAT GCTATGCCCA GAATATGAAG TATCTGTTTT 
4051 TGAAATTCAC TTTATTTAAA AGATAAGCAG CTGACTGGGC ACGGTGCCTC 
4101 ATCCCTGTAA TCCTAGCACC TTGGGAGGCT GAGGCAGGTG GATCACCTAA 
4151 GGTCAGGAGT TCAACAACAC CAGCCTGACC AACATGGTGA AACCCCATCT 
4201 CTACTAAAAA TACAAAAATC AGCCGGGTCT CATGGCAGGC ACCTGTAATC 
4251 CCATCTACTG AGGCAGGAGA ATTGCTTGAC CCAGGAGGCA GAGGTTGCAG 
4 301 TGAGCCAAGA TCACGCCATT GCACTCCAGC CTGGGGGACA GAGCAAGACT 
4351 CTATCTCCAA AAAACAAAAA AGATAAGCAG CTTTAGAATA TGGCGCATTC 
4401 AAAACAGTCT CAGTAACAAA GACATTAAAA GAAAACAATT TACTTTCTAA 
4451 TTAAAATTTT GTGTTTCTTA AGATCAAATC ATATAGGTAA CTTCATAGAC 
4501 CTAAATTAAA AGTGATTTTT GGCTGGACTG GCAACAATGT TCCCAATGTC 
4551 TTTACTTTTT AAAAAAGGCT TTTCATATTT AAGCACATAC CTATTTTGTA 
4601 GACTTACATT GTTTAATATT TATTTTAATC TTAATATTTT TACATTATTA 
4651 TATTGCATTA TTTATTTTTT CTAAGTTCCA GAATAATAGT GTCATTATTA 
4701 TAGACTATAT GT7TTGAAGT TTGATATTAT AATGGGATAT TCATTTTTTG 
4751 TTCTTTTCTT GACTCCTTTC TCAAGTGTGT GATAAGGTCT GCTGATAAAA 
4601 TATTTAACCC CAAGAAAGTG AAAACTAATA TAAAATTAGA AAGACCTATC 
4851 CAAATTAGAC AGTCAATTCC ATTAAAATAA GAAGTGAGAA AAACAATGTT 
4901 GGGCATTGAG GTGTAAATTT TGCCCAGATG TATACCCAGT GTGAAATATC 
4951 TTCTAATAAA AATATATTTG GCTCTTATCC CTGCACATGT AGAGGCATAA 
5001 AAATTGGTAA ACATGTCCCG CTGTGTAGAA CTTTAAAAAA AAGGCATTTT 
5051 TGAAAGTGTT GAGTGGCACT GATAACTGGT GAAGCCTACA GCCATCCGCC 
5101 CAAAAGTCTG TTCTGATGGC ACTGAGTTTT CATTGTTCTG GATGTATAAG 
5151 TCTGTGTGTC AGGTACAGCT GGGCCCAGCC AGCTTGAGTC ACTCTTGTAC 
5201 AAGCTTGTTT TTTTCTGTCT TGTGAATGCA CTTGATAATT TAAAAATAAA 
5251 AATATCTGTT TCTCTGCAAA AAAAAAA 



BLAST Results 



Entry HS32B1 from database EMBL: 

Human DMA sequence *** SEQUENCING IN PROGRESS *** from clone 32B1 
Score - 4445, P - 0.0e+00, identities - 889/889 

Entry U93816 from database EMBL: 

Human exon-trapped sequence from 6q24 . 

Score - 965, P - 4.0e-35, identities - 193/193 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORf from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 MPTAESEAKV KTKVRFEKLL KTHSDLMREK KKLKKKLVRS EENISPDTIR 
51 SNLHYMKETT SDDPDTIRSN LPHIKETTSD DVSAANTNNL KKSTRVTKNK 
101 LRNTQLATEN PNGDASVEED KQGKPNKKVI KTVPQLTTQD LKPETPENKV 
151 DSTHQKTHTK PQPGVDHQKS EKANEGREET DLEEDEELMQ AYQCHVTBEM 
201 AKEIKRKIRK KLKEQLTYFP SDTLFHDDKL SSEKRKKKKE VPVFSKAETS 
251 TLTISGDTVE GEQKKESSVR SVSSDSHQDD EISSMEQSTE DSMQDDTKPK 
301 PKKTKKKTKA VADNNEDVDG DGVHEITSRO SPVYPKCLLD DDLVLGVYIH 
351 RTDRLKSDFM ISHPMVKIHV VDEHTGQYVK KDDSGRPVSS YYEKENVDYI 
401 LPIMTQPYDF KQLKSRLPEW EEQIVFNENF PYLLRGSDES PKVILFFEIL 
451 DFLSVDEIKN NSEVQNQECG FRKIAWAFLK LLGANGNANI NSKLRLQLYY 
501 PPTKPRSPLS WEAFEWWSK CPRNHYPSTL YVTVRGLKVP DCIKPSYRSM 
551 MALQEEKGKP VHCERHHESS SVDTEPGLEE SKEVIKWKRL PGOACRIPNK 
601 HLFSLNAGER GCFCLDFSHN GRILAAACAS RDGYPIILYE IPSGRFMREL 
651 CGHLNIIYDL SWSKDDHYIL TSSSDGTARI WKNEINNTNT FRVLPHPSFV 
"701 YTAKFH PAVR ELVVTGCYDS MIRIWKVEMR EDSAILVRQF DVHKSFINSL 
751 CFDTEGHHMY SGDCTGVIW WNTYVKINDL EHSVHHWTIN KEIKETEFKG 
801 IPISYLEIHP NGKRLLIHTK DSTLRIMDLR I LVARKFVGA ANYREKIHST 
851 LTPCGTFLFA GSEDGIVYVW HPETGEQVAM YSDLPFKSPI RDISYHPFEN 
901 MVAFCAFGQN EPILLYIYDF HVAQQEAEMF KRYNGTFPLP GIHQSQDALC 
951 TCPKLPHQGS FQIDEFVHTE SSSTKMQLVK QRLETVTEVI RSCAAKVNKN 
1001 LSFTSPPAVS SQQSKLKQSN MLTAQEILHQ FGFTQTGIIS IERKPCNHQV 
1051 DTAPTVVALY DYTANRSDEL TIHRGDIIRV FFKDNEDWWY GSIGKGQEGY 
1101 FPANHVASET LYQELPPEIK ERSPPLSPEE KTKIEKS PAP QKQSINKNKS 
1151 QDFRLGSESM THSEMRKEQS HEDQGHIMDT RMRKNKQAGR KVTLIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_ln3, frame 1 

TREMBL :U92792_1 gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
pombe general~transcriptional repressor Tupl (tupl) mRNA, complete 
cds., N - 1, Score » 186, P - le-10 

TREMBL :AF1 04 25 8_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N » 1, Score - 235, P « 4.6e-18 

TREMBL: SPAC3H5_8 gene: "SPAC3H5 . 08c"; product: "beta-transducin"; 
S.pombe chromosome I cosmid c3H5 . , N - 2, Score « 231, P - 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N ■» 
2. Score - 228, P - le-13 

TREMBL: AF104 25 8_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N - 1, Score - 235, P - 4.6e-18 

TREMBL: SPAC 3 H5_8 gene: "SPAC3H5.08c"; product: "beta-transducin"; 
S.pombe chromosome 1 cosmid c3H5., N - 2, Score » 231, P - 2e-14 

TREMBL :CER03E1 1 gene: "R03E1.1";.. Caenorhabditis elegans cosmid R03E1, 
N - 1, Score -"215, P - 2.3e-13 

SWISSPROT:YZLL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN K04G11.4 IN CHROMOSOME X., N - 1, Score - 203, P - 7. le-13 



>TREM3L:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733> mRNA, complete cds. 
Length « 321 

HSPs: 



Score - 235 (35.3 bits), Expect - 4.6e-18, P - 4.6e-lS 
Identities - 59/225 (26%), Positives - 111/225 (49%) 



Query: 


647 


MRELCGHLNIIYDLSWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFH 


706 




♦ E GH + I DLSWSK+ +L++S D T R+W ++ + +V H ++V +F+ 




Sbjct: 


63 


VHEFYGKGDAILDLSWSKNGD-LLSASMDKTVRLW— QVGRDSCLKVFSHTNYVTCVQFN 


119 


Query : 


707 


PAVRELVVTGCYDSMIRIWKVEMREDSAILVRQFOVHKSFINSLCFDTEGHHMYSGDCTG 


766 




P +TGC D ++RIW V LV + K + ++C+ +G +G TG 




Sbjct: 


120 


PTNGNYFI TGC IDGLVRIWDVRK CLVV DWANS KE IVTAVC Y RP DGKGA VAGT I TG 


174 


Query: 


767 


VIWWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRI 


826 




+ + +LE V ++N K + + Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV---SLNGRKKSLHKRIVGFQYCPSDP--KKLMVTSGDAQVRI 229 

Query: 827 MDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGIVYVWN 871 

+D +++ +G+ +++TPG++ S+D +Y+WN 
Sbjct: 230 LDGAHVISN- YKGLQS-SSQVARSFTPDGDHIVSASDDSRIYMWN 272 



Pedant information for DKFZphtes3_ln3, frame 1 



Report for DKFZphtes3_ln3 . 1 

[LENGTH) 1196 

(MW) 137114.70 

|pl] 6.79 

[HOMOL] SWISSPROT:YKY4 CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 
C14B1.4 IN CHROMOSOME III. 8e r 21 

[ FUNCAT 1 99 unclassified proteins (S. cerevisiae, YKL121w) 2e-ll 

[FUNCAT] 04.05.01.01 general transcription activities IS. cerevisiae, YBR198C 

TAF90 - TFIID subunit] 4e-10 

[ FUNCAT 1 30.10 nuclear organization [S. cerevisiae, YBRl98c TAF90 - TFIID subunit} 
4e-10 

[FUNCAT] 06-10 assembly of protein complexes [S. cerevisiae, YPR178w] le-08 

[ FUNCAT ] 04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] le-08 

[ FUNCAT J 03.22 cell cycle control and mitosis [S. cerevisiae, YDR364c] 4e-08 

( FUNCAT J 03.16 dna synthesis and replication [S. cerevisiae, YDR364c] 4e-08 

(FUNCAT J 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDLl45c) 

9e-08 

[FUNCAT] 30.09 organization of intracellular transport vesicles (S. cerevisiae, 

YDL145C] 9e-08 

I FUNCAT] 04.05.01.04 transcriptional control (S. cerevisiae, YCR084cl 2e-07 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YHL002w] 7e-07 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 

[FUNCAT] 02.16 fermentation [S. cerevisiae, YMR116c) 4e-06 

[FUNCAT] 30.03 organization of cytoplasm [S, cerevisiae, YMR116cl 4e-06 

[FUNCAT] 05.04 translation (initiation, elongation and termination) (S. cerevisiae, 
YMR116C] 4e-06 

[ FUNCAT ] 03.10 sporulation and germination [S. cerevisiae, YFL009w] 4e-05 

[ FUNCAT) 03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 

4e-05 

[ FUNCAT) 30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] 4e-05 

[ FUNCAT ) 03.01 cell growth [S. cerevisiae, YCR088w] 6e-05 

(FUNCAT) 03.25 cytokinesis [S. cerevisiae, YCR057c] 7e-05 

( BLOCKS ) BL00024H 

[SCOP1 dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-91 

[SCOP1 dlgfc 2.21.2.1.9 Growth factor receptor-bound protein 2 (GRB2), N 4e-14 

[SCOP) dlfmk_l 2.21.2.1.8 (1-64) c-src tyrosine kinase [human (Horn Se-15 

[SCOP] dladSbl 2.21.2.1.7 (1-63) Hemapoetic cell kinase Hck (human (Horn 3e-15 

[SCOP] dllckal 2.21.2.1.16 ( 1-54 > p56-lck tyrosine kinase, SH3 domain [huma le-13 

[SCOP] dlqwea 2.21.2.1.15 Src kinase, SH3 domain [Avian sarcoma virus 2e-15 

[SCOP] dlshg_2 2.21.2.1.6 alpha-Spectrin, SH3 domain [chicken {Gallu 2e-13 

[SCOP] dlprmc_ 2.21.2.1.13 Src kinase, SH3 domain [chicken {Gallus gallus) 2e-15 

[SCOP] dlhsq 2.21.2.1.12 Phospholipase C, SH3 domain [human (Horn 2e-13 

[SCOP] dlaboa_ 2.21.2.1.3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 

[SCOP] dlefna_ 2.21.2.1.2 Fyn, SH3 domain [human (Homo sapiens) 2e-15 

(SCOP] dlsema_ 2.21.2.1.11 Growth factor receptor-bound protein 2 (GRB2), N le-13 

(SCOP) dlgbqa_ 2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2), N 3e-16 

[SCOP] dlckaa_ 2.21.2.1.1 C-Crk, N-terminal SH3 domain [mouse (Mu 3e-15 

[EC] 3.1.4.3 Phospholipase C 2e-07 

[EC J 3.1.4.11 l-Phosphatidylinositol-4, 5-bisphosphate phosphodiesterase 7e-07 

[EC] 3.6.1,32 Myosin ATPase 7e-07 

[EC] 2.7.1.112 Protein-tyrosine kinase 8e-06 

[PIRKW) nucleus 2e-08 

(PIRKW) phosphotransferase 8e-06 

[PIRKW] plasma 4e-07 

[PIRKW] duplication 4e-07 

[PIRKW] phosphoric diester hydrolase 2e-07 

[PIRKW] tandem repeat 7e-07 

[PIRKW] hormone 4e-07 

[PIRKW] transmembrane protein 2e-06 

[PIRKW] stomach 4e-07 

[PIRKW] actin binding 7e-07 

[PIRKW] ATP 7e-07 

[PIRKW] phosphoprotein 7e-07 

[PIRKW] signal transduction 7e-09 

[PIRKW] heterotrimer 7e-09 

[PIRKW] P-loop 7e-07 

[PIRKW] hydrolase 7e-07 

[PIRKW] transcription regulation 5e-06 

[PIRKW] GTP binding 7e-09 
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[SUPFAM] l-phosphatidylinositol-4, S-bisphosphate phosphodiesterase II 2e-07 

[SUPFAM] SH3 homology 2e-07 

[SUPFAH] SH2 homology 2e-07 

[SUPFAM] protozoan myosin heavy chain IB 7e-07 

(SUPFAH] myosin motor domain homology 7e-07 

[SUPFAM] pleckstrin repeat homology '2e-07 

[SUPFAM] protein-tyrosine kinase src 8e-06 

[SUPFAM] WD repeat homology 3e-12 

(SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain Y homology 2e- 
07 

(SUPFAM) protein kinase homology 6e-06 

(SUPFAM) l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain X homology 2e- 
07 

(SUPFAM) GTP-binding regulatory protein beta chain 7e-09 

(SUPFAM) yeast coatomer complex alpha chain 4e-07 

[PROSITE] RGD 1 

[PROSITE) MYRISTYL 6 

[ PROSITE) AMI DAT I ON 2 

[ PROSITE) CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PHOSPHO SITE 25 

[PROSITE] TYR_PHOSPHO~SITE 4 

[PROSITE] PKC PHOSPHO_SITE 19 

[ PROSITE] ASNJJLYCOSYLATION 6 

[PFAH] Src homology domain 3 

[PFAM] WD domain, G-beta repeats 

[KW] Irregular 

[KW] 3D 

(KW] LOW_COMPLEXITY 5.77 % 

(KW) COILED COIL 2.42 % 



SEQ MPTAES EAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENISPDTIRSNLHYMKETT 

SEG xxxxxxxx 

COILS ccccccccccccccccccccccccccccc 

IgotB 



SEQ SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 

SEG 

COILS 

IgotB 

SEQ KQGKPNKKVIKTV PQLTTQDLK PET PEN KVDSTHQKTHTK PQ PG V DHQKSE KANEGREET 

SEG xxx 

COILS 

IgotB 

SEQ DLEEDEELMQAYQCHVTEEMAKEIKRKIRKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 

SEG xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 

COILS 

IgotB 

SEQ VPVFSKAETSTLTISGDTVEGEQKKFSSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 

SEG xxxxxxxxxx xxxx 

COILS 

IgotB 

SEQ PKKTKKKTKAVADNNEDVDGDGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 

SEG xxxxxxxxx 

COILS 

IgotB 

SEQ ISHPMVKIHVVDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 

SEG 

COILS 

IgotB 

SEQ EEQIVFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNNSEVQNQECGFRKIAWAFLK 

SEG 

COILS 

IgotB 

SEQ LLGANGNANINSKLRLQLYYPPTKPRSPLSVVEAFEWWSKCPRNHYPSTLYVTVRGLKVP 

seg ..: 

COILS 

IgotB 

SEQ DCIKPSYRSMMALQEEKGKPVHCERHHESSSVDTEPGLEESKEVIKWKRLPGQACRIPNK 

SEG 

COILS 

IgotB 
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SEQ KLFSLNAGERGCFCLDFSHNGRILAAACASRDGYPIILYEIPSGRFMRELCGHLNIIYDL 

SEG 

COILS 



SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELWTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT — TTCEEEEEETTTCBEEEEETTT-TCEEEEEETTT 

SEQ HIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVIVVWNTYVKINDL 

SEG 

COILS * 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEEEEEE 

SEQ EHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGIVYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

SEQ MVA FCAFGQNEP ILLYIYD FH VAQQEAEMFKRYNGT FPL PG I HQSQDALCTC PKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESSSTKMQLVKQRLETVTEVIRSCAAKVNKNLSFTSPPAVSSQQSKLKQSN 

SEG 

COILS 

IgotB 

SEQ MLTAQEILHQFGFTQTGIISIERKPCNHQVDTAPTVVALYDYTANRSDELTIHRGDIIRV 

SEG 

COILS 

IgotB 

SEQ FFKDNEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKIEKSPAP 

SEG 

COILS 

IgotB 

SEQ QKQSIHKNKSQDFRLGSESMTHSEMRKEQSHEDQGHIMDTRMRKNKQAGRKVTLIE 

SEG 

COILS 

IgotB 



IgotB 



CEEEEEECCCCCEEEE 



Prosite for DKFZphtes3_ln3 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS0000S 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSOOO0S 



1190->1194 



1000-M004 
1065->1069 
1148->1152 
91->95 



170->173 
232->235 
268->271 
304->307 
327->330 
352->355 
384->387 
440->443 
533->536 
546->549 
643->646 
677->680 
690->693 
702->705 



460->464 
686->690 
934->938 



264->268 
305->309 



4B->51 
66->69 
93->96 



AS W_GL YCOS YLAT I ON 
A S N_GL YCOS YLAT I ON 
A S H_GL YCOS YLAT I ON 
AS N_GL YCOS YLAT I ON 
AS H_GL YCOS YLAT I ON 
A S H_GL YCOS YLAT I ON 
CAMP PHOSPHO SITE 
CAMP~PHOSPHO~SITE 
CAMP~PHOSPHO_SITE 
CAHP~PHOSPHO_SITE 
PKC_PHOSPKO SITE 
PKC PHOSPKO~SITE 
PKC~PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
P KC_ PH OS PHO_S ITE 
PKC PHOSPHO SITE 
PKC~PHOSPHO~SITE 
PKC~PHOSPHO SITE 
PKC PHOSPHO~SITE 
PKC~PHOSPHO~SITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO_SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
P DOC 00 001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDCC000O5 
POOC00005 
PDOC00005 
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PS00005 


823->826 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


973->976 


PKC~PHOSPHO" 


"SITE 


PDOC00005 


PS00006 


22->26 


CK2 PHOSPHO* 


"site 


PDOC000Q6 


PS00006 


59->63 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


77->81 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


116->120 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


137-M41 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


180->184 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


245->249 


CK2 PHOSPHO" 


"site 


POOC00006 


PS00006 


276->260 


CK2 PHOSPHO" 


"site 


PDOC00006 


PSOO006 


263->287 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


288->292 


CK2~PHOSPHO~ 


"site 




PS00006 


292->296 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS000O6 


327->331 


CK2 PHOSPHO" 


"site 


PDOC0Q006 


psooooe 


390->394 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


454->456 


CK2 PHOSPHO" 


'site 


PDOC00006 


psooooe 


510->514 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


570->574 


CK2~PHOSPHO" 


"site 


P0OC00006 


PS00006 


663->667 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


672->676 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


804->608 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


985->989 


CK2 PHOSPHO 


'site 


PDOC00006 


PS00006 


1023->1027 


CK2 PHOSPHO 


'site 


PDOC00006 


P300006 


1127->1131 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


1132->1136 


CK2 PHOSPHC'SITE 


PDOC00006 


PS00006 


1161->1165 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


1170->H74 


CK2 PHOSPHO" 


'site 


pDocooooe 


PS00007 


1083->1091 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


211->219 


TYR PHOSPHO" 


site 


PDOC00007 


PS00007 


1083-M091 


TYR~PHOSPHO~ 


"site 


PDOC00007 


PS00007 


210->219 


TYR~PHOSPHO~ 


SITE 


PDOC00007 


PS00008 


483->489 


MYRISTYL 




P DOC 000 08 


psooooe 


577->583 


MYRISTYL 




PDOC0000B 


psooooe 


716->722 


MYRISTYL 




PDOC00008 


psooooe 


800->806 


MYRISTYL 




PDOC00008 


PS00008 


661->667 


MYRISTYL 




PDOC00008 


psooooe 


941->947 


MYRISTYL 




PDOC00008 


PS00009 


811->615 


AMI DAT I ON 




PDOC00009 


PS00009 


1188->1192 


AMI DAT I ON 




PDOC00009 


PS00016 


1074->1077 


RGD 




PDOC00016 



Pfam for DKFZphtes3_ln3 . 1 



HMM_HAME WD domain, G-beta repeats 

HMM * Mr GHnnWVWCVa FS P DG rW FI vSGSWDgTCRLWD * 

+ GH+N ++++++S 0 ++ I+++S DGT R+W 
Query 650 LCGHLNI IYDLSWSKDDHY-ILTSSSDGTARIWK 



HMM_NAME Src homology domain 3 

HMM *pyVIALYDYqAqdpDELSFkEGDIIiIIEdsDD. WWrgRnnnTNGQEGW 

P+V+ALYDY+A+++DEL++ +GDII + ++++ WW+G GQEG+ 
Query 1054 PTWALYDYTANRSDELTIHRGDIIRVFFKDNEDWWYGSIGK--GQEGY 

HMM IPSNYVEPi* 
+P+N V+ ♦ 

Query 1101 FPANHVASE 1109 
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DKFZphtes3_20c21 



group: testes derived 

DKFZphtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /raap-"22qll.2-12.2" 
Insert length: 3997 bp 

Poly A stretch at pos. 3877, polyadenylation signal at pos. 3853 

1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 

101 AGTCCCGAAC CCCGAGCACT GGGATGCCTG GCTACTCCGA GCCAAGGCAC 

151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 

201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 

251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 

301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 

351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 

401 AAA TAT AC AT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 

451 ATGCTGAGTG AAAACTAACC AACTGGCAGT TGTGACTGAA AACACTGAAA 

501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 

551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 

601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 

651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 

701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 

7 51 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 

801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 

851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 

901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 

951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
1251 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGATGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
1751 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC CTTCCTACAT GTGCCTGTTC 
1B01 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CTCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTC CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG CAGTTCAGGC 
2151 TCAGCAAACT GTCAGGGTGC TGGCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2401 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
2451 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGCAAAGCA GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
2751 CCCAGGAGGT GACTGGGAAG GAGAAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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2651 TTTAGGAACC AAATGATATT 
2901 GATGTGTTTT GGGGGCAGGG 
2951 ACTTGATAAA GAACTGTATT 
3001 TGGCTCCCTC TCTGCCATAC 
3051 TTCCATCCCA GCTTGAATTG 
3101 CTAGAACCTG ATCGTCCACT 
3151 CCAGGTGGTG GTAGGCGGTG 
3201 GCAGGCCGAC TCCACTCCCA 
3251 TGCTGGGAGG TCCGGATCGT 
3301 CACTTGGTTG AATTCTGTTG 
3351 TTGGACCCAC AATGGGGGCA 
3401 TTTAGAGATC CCTTTATAAA 
3451 TGACAACAGG ACCAACCTGC 
3501 CTGGTTCCTC TCGAGCGAGT 
3551 GGGGTAAAAG CACTGTGCTT 
3601 ACTCAGCTGT GTGTTCCTGG 
3651 TTACGTTATA GTCAGACATT 
3701 GAAAATATTT GTCAAAATCT 
3751 TTCCATTTGA GAGTTGTATT 
3801 CTGATGTTTA TGATATGGTG 
3851 AGAAATAAAA TAGCCAAAAA 
3901 AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAAAAAA 



TGAGTTTTTG TTATTCCTTT TGCAGATTGG 
GTTAGTTCTT CAGGTCGGCA GACCCAGAGC 
TAATCGGTAG TGTTGGGGCC GGGACGGGCT 
TGAGCCTGAG GTATTTCATA TCTCCTGCTG 
GTGCCACAAG CTTCCAAGTT GGCATTTTTT 
AGCCCAGAGT GTGTGTGTTC AACCCCCACA 
TGACTGCACA GCGAGGTGCC GGATCTGTGA 
CGCCGCAGGT AGGTTTCTCC AGTGCGCTCT 
TCCTGCAGGG AAGCGGCAGC ACACGGAGAC 
GAACTCTACT CAAATCTAGG GGCGTCTTCT 
AGCCTTAATA ATATGGAAGG GAGTTTGGGC 
AGCTCTGGGG GCTGAGCCCT GAGAATTCAG 
GCTGCCTTTG ACTACAAGTG GGCCGTGCAG 
GTCCCTAAAT AGGAGTTTAC AAGATGTCTG 
TTCAGTGGTG GCTGCGTGAA AGGGAGCGAC 
GCTTGTGTGG TACTTAGAAC CTCAGTTCTA 
TTTTTGACAG TATGAGACAG ACTGCAGGAT 
TAACTGAATG TTTACTGGAA GTACTTGAGA 
GTTAATAATT TCATGTCAGT GAACTGATAT 
TCTTTTTCTT GAAACAAGCT TCCAAGGGCT 
ATGCTGGAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS1048E9 from database EMBLNEW: 

Human DNA sequence from clone 1048E9 on chromosome 22qll.2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score - 6540, P « 0.0e+00, identities - 1308/1308 
-14 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORE* from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 



1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 
51 ELLCGQIAGV VRCVSDISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 
101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 
151 SDLHKIFNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 
201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 
251 TKEEAISLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 
301 VESMAWTTPD PTSPDEACPD GRKENGCLSG HDLESIRPAG LHNSARGEVL 
351 GLSSSLGKEL VFLQEELDLS EIHIPEAQEV EMASGHFAFL HVPVPDGRAP 
401 YCKASLSASS SLEPT PPEDT AISSLRPPSA PEMLTQHGAQ EQVEDHPGHS 
451 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 
501 HAAPGLECSS GSANCQGAGP SADGISSRLT PAESCMGLVR MNLYTHCVKG 
551 LMLSLLAEEP LLGDSAAIEE VYHSSLASLN GLEVHLKETL PRDEAASTSS 
601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLMHSEF AQLPALYEMT 
651 VRNASTAVYA CCNPIQETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_20c21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c21, frame 3 



Report for Dr.FZphtes3_20c21.3 
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[LENGTH] 708 

[MW] 76900.23 

tpl] 5.30 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.36 % 

SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTRAGICYFYPSQTLLDWELLCGQIAGV 

SEG . xxxxxxxxxxxx ............ 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCVSDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeeeccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIEQILKNTSDLHKIFNSLWNLDQTKVEPLLLLKAARH, 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

SEQ PNVQI I PVFVTKEEAI SLHEFPVEQMTRSLAS PAGLQDGSAQHHPKGGSTSALKENATGH 

SEG 

PRD ccceeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGRKENGCLSGHDLESIRPAGLHNSARGEVLGLSSSLGKEL 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccchhh 

SEQ VFLQEELDLSEIHIPEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AISSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPIPRADPLPRRTRRPLLLPRLOPGQRG 

SEG xxxxxxxxxxxxxxxxxxxxx .... 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLGDSAAIEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

SEQ T YN FT Y Y DR I QS LLMAN L PQVAT PHDRRFLQAVS LMH S E FAQL P AL YEMT VRNASTAV YA 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhccceeeee 

SEQ CCNPIQETYFQQLAPAARSSGFPHPQDGAFSLSGKAKQKLLKHGVHLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc 

(No Prosite data available for DKFZphtes3_20c21 . 3) 
(No Pfam data available for DKFZphtes3_20c21. 3) 
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DKFZphtes3_20Jc2 



group: signal transduction 

DKF2phtes3_20lc2 encodes a novel 839 amino acid protein with strong similarity to rat vanilloid 
receptor subtype 1 . 

VRl seems to play an important role in the activation and sensitization of nociceptors, it is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VRl. 

The new protein can find application as a target for the development of new nociception- 
modulating drugs. 



strong similarity to rat vanilloid receptor subtype 1 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 4187 bp 

Poly A stretch at pos . 4154, polyadenylation signal at pos. 4135 



1 GGCTCAGGCA GGCCTGGCCC AGAGTCACGC TGGCAACCAC GAGTTTGGGA 
51 AGCAGTCGTA TTCTCTCTCT CTCTCTCTCT CTCTCAGTAT CCATGACAGT 
101 GTGATGGAGA GTCTCTGCCG TGCCATCTGG GATGCAAACC GTCCCTGTGT 
151 CCCCCACGTC CAGGCCGTAG ATGCTCCCCG CCGGTCAGTC ACTTAGTCGT 
201 CAGATCGCCC GTCCTGGTAT CACAGTGCTT CTGTTCAGGT TGCACACTGG 
251 GCCACAGAGG ATCCAGCAAG GATGAAGAAA TGGAGCAGCA CAGACTTGGG 
301 GGCAGCTGCG GACCCACTCC AAAAGGACAC CTGCCCAGAC CCCCTGGATG 
351 GAGACCCTAA CTCCAGGCCA CCTCCAGCCA AGCCCCAGCT CTCCACGGCC 
4 01 AAGAGCCGCA CCCGGCTCTT TGGGAAGGGT GACTCGGAGG AGGCTTTCCC 
451 GGTGGATTGC CCTCACGAGG AAGGTGAGCT GGACTCCTGC CCGACCATCA 
501 CAGTCAGCCC TGTTATCACC ATCCAGAGGC CAGGAGACGG CCCCACCGGT 
551 GCCAGGCTGC TGTCCCAGGA CTCTGTCGCC GCCAGCACCG AGAAGACCCT 
601 CAGGCTCTAT GATCGCAGGA GTATCTTTGA AGCCGTTGCT CAGAATAACT 
651 GCCAGCATCT GGAGAGCCTG CTGCTCTTCC TGCAGAAGAG CAAGAAGCAC 
701 CTCACAGACA ACGAGTTCAA AGACCCTGAG ACAGGGAAGA CCTGTCTGCT 
751 GAAAGCCATG CTCAACCTGC ATGACGGACA GAACACCACC ATCCCCCTGC 
801 TCCTGGAGAT CGCGCGGCAA ACGGACAGCC TGAAGGAGCT TGTCAACGCC 
851 AGCTACACGG ACAGCTACTA CAAGGGCCAG ACAGCACTGC ACATCGCCAT 
901 CGAGAGACGC AACATGGCCC TGGTGACCCT CCTGGTGGAG AACGGAGCAG 
951 ACGTCCAGGC TGCGGCCCAT GGGGACTTCT TTAAGAAAAC CAAAGGCCGG 
1001 CCTGGATTCT ACTTCGGTGA ACTGCCCCTG TCCCTGGCCG CGTGCACCAA 
1051 CCAGCTGGGC ATCGTGAAGT TCCTGCTGCA GAACTCCTGG CAGACGGCCG 
1101 ACATCAGCGC CAGGGACTCG GTGGGCAACA CGGTGCTGCA CGCCCTGGTG 
1151 GAGGTGGCCG ACAACACGGC CGACAACACG AAGTTTGTGA CGAGCATGTA 
1201 CAATGAGATT CTGATCCTGG GGGCCAAACT GCACCCGACG CTGAAGCTGG 
1251 AGGAGCTCAC CAACAAGAAG GGAATGACGC CGCTGGCTCT GGCAGCTGGG 
1301 ACCGGGAAGA TCGGGGTCTT GGCCTATATT CTCCAGCGGG AGATCCAGGA 
1351 GCCCGAGTGC AGGCACCTGT CCAGGAAGTT CACCGAGTGG GCCTACGGGC 
1401 CCGTGCACTC CTCGCTGTAC GACCTGTCCT GCATCGACAC CTGCGAGAAG 
14 51 AACTCGGTCC TGGAGGTGAT CGCCTACAGC AGCAGCGAGA CCCCTAATCG 
1501 CCACGACATG CTCTTGGTGG AGCCGCTGAA CCGACTCCTG CAGGACAAGT 
1551 GGGACAGATT CGTCAAGCGC ATCTTCTACT TCAACTTCCT GGTCTACTGC 
1601 CTGTACATGA TCATCTTCAC CATGGCTGCC TACTACAGGC CCGTGGATGG 
1651 CTTGCCTCCC TTTAAGATGG AAAAAATTGG AGACTATTTC CGAGTTACTG 
1701 GAGAGATCCT GTCTGTGTTA GGAGGAGTCT ACTTCTTTTT CCGAGGGATT 
1751 CAGTATTTCC TGCAGAGGCG GCCGTCGATG AAGACCCTGT TTGTGGACAG 
1801 CTACAGTGAG ATGCTTTTCT TTCTGCAGTC ACTGTTCATG CTGGCCACCG 
1651 TGGTGCTGTA CTTCAGCCAC CTCAAGGAGT ATGTGGCTTC CATGGTATTC 
1901 TCCCTGGCCT TGGGCTGGAC CAACATGCTC TACTACACCC GCGGTTTCCA 
1951 GCAGATGGGC ATCTATGCCG TCATGATAGA GAAGATGATC CTGAGAGACC 
2001 TGTGCCGTTT CATGTTTGTC TACATCGTCT TCTTGTTCGG GTTTTCCACA 
2051 GCGGTGGTGA CGCTGATTGA AGACGGGAAG AATGACTCCC TGCCGTCTGA 
2101 GTCCACGTCG CACAGGTGGC GGGGGCCTGC CTGCAGGCCC CCCGATAGCT 
2151 CCTACAACAG CCTGTACTCC ACCTGCCTGG AGCTGTTCAA GTTCACCATC 
2201 GGCATGGGCG ACCTGGAGTT CACTGAGAAC TATGACTTCA AGGCTGTCTT 
2251 CATCATCCTG CTGCTGGCCT ATGTAATTCT CACCTACATC CTCCTGCTCA 
2301 ACATGCTCAT CGCCCTCATG GGTGAGACTG TCAACAAGAT CGCACAGGAG 
2351 AGCAAGAACA TCTGGAAGCT GCAGAGAGCC ATCACCATCC TGGACACGGA 
2401 GAAGAGCTTC CTTAAGTGCA TGAGGAAGGC CTTCCGCTCA GGCAAGCTGC 
2451 TGCAGGTGGG GTACACACCT GATGGCAAGG ACGACTACCG GTGGTGCTTC 
2501 AGGGTGGACG AGGTGAACTG GACCACCTGG AACACCAACG TGGGCATCAT 
2551 CAACGAAGAC CCGGGCAACT GTGAGUGCGT CAAGCGCACC CTGAGCTTCT 
2601 CCCTGCGGTC AAGCAGAGTT TCAGGCAGAC ACTGGAAGAA CTTTGCCCTG 
2651 GTCCCCCTTT TAAGAGAGGC AAGTGCTCGA GATAGGCAGT CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 
27 51 CTGAGGTCTT CAAGAGTCCT GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 
2801 GCAGACAGCA CTGTCAACAC TGGGCCTTAG GAGACCCCGT TGCCACGGGG 
2851 GGCTGCTGAG GGAACACCAG TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 
2901 CCTGCCCAGC ATGTTCCCAA ATCTGTGCTG GACAAGCTGT GGGAAGCGTT 
2951 CTTGGAAGCA TGGGGAGTGA TGTACATCCA ACCGTCACTG TCCCCAAGTG 
3001 AATCTCCTAA CAGACTTTCA GGTTTTTACT CACTTTACTA AACAGTTTGG 
3051 ATGGTCAGTC TCTACTGGGA CATGTTAGGC CCTTGTTTTC TTTGATTTTA 
3101 TTCTTTTTTT TGAGACAGAA TTTCACTCTT CTCACCCAGG CTGGAATGCA 
3151 GTGGCACAAT TTTGGCTCCC TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 
3201 TCTCCTGCCT CGGCTTCCCA AGTAGCTGGG ATTACAGGCA CGTGCCACCA 
3251 TGTCTGGCTA ATTTTTTGTA TTTTTTTAAT AGATATGGGG TTTCGCCATG 
3301 TTGGCCAGGC TGGTCTCGAA CTCCTGACCT CAGGTGATCC GCCCACCTCG 
3351 GCCTCCCAAA GTGCTGGGAT TACAGGTGTG AGCCTCCACA CCTGGCTGTT 
3401 TTCTTTGATT TTATTCTTTT TTTTTTTTCT GTGAGACAGA GTTTCACTCT 
34 51 TGTTGCCCAG GCTGGAGTGC AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 
3501 TGCCTCCCGG GTTCAAGCGA TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 
3551 GATTACAGGT GAGCACTACC ACGCCCGGCT AATTTTTGTA TTTTTAATAG 
3601 AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCGAACT CTTGACCTCA 
3651 GGTGATCTGC CCGCCTTGGC CTCCCAAAGT GCTGGGATTA CACGTGTGAG 
3701 CCGCTGCGCT CGGCCTTCTT TGATTTTATA TTATTAGGAG CAAAAGTAAA 
3751 TGAAGCCCAG GAAAACACCT TTGGGAACAA ACTCTTCCTT TGATGGAAAA 
3801 TGCAGAGGCC CTTCCTCTCT GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 
3851 GTGGTTTGGG GGTGTTGGTG TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 
3901 TCCCACTCCC AGCTCTGGCA GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 
3951 ATCCTTCCTT ACGATCAATC ACAGTCTCCA GAAGATCAGC TCAATTGCTG 
4001 TGCAGGTTAA AACTACAGAA CCACATCCCA AAGGTACCTG GTAAGAATGT 
40S1 TTGAAAGATC TTCCATTTCT AGGAACCCCA GTCCTGCTTC TCCGCAATGG 
4101 CACATGCTTC CACTCCATCC ATACTGGCAT CCTCAAATAA ACAGATATGT 
4151 AT AC AT AT AA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



no blast result 



Medline entries 



99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880: 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 



Peptide information for frame 2 



ORF from 272 bp to 2788 bp; peptide length: 839 
Category: strong similarity to known protein 
Classification: Cell signaling /communication 



1 MKKWSSTDLG AAADPLQKDT 
51 GKGDSEEAFP VDCPHEEGEL 
101 SVAASTEKTL RLYDRRSIFE 
151 DPETGKTCLL KAMLNLHDGQ 
201 KGQTALHIAI ERRNMALVTL 
251 LPLSLAACTN QLGIVKFLLQ 
301 DNTKFVTSMY HEILILGAKL 
351 AYILQREIQE PECRHLSRKF 
401 AYSSSETPNR HDMLLVEPLN 
4 51 MAAYYRPVDG LPPFKMEKIG 
501 PSMKTLFVDS YSEMLFFLQS 
551 NMLYYTRGFQ QMGIYAVMIE 
601 DGKNDSLPSE STSHRWRGPA 
651 TENYDFKAVF I I LLLAYVIL 
701 QRAITILDTE KS FLKCMRKA 
751 TTWNTNVGII NEDPGNCEGV 
801 SARDRQSAQP EEVYLRQFSG 



CPDPLDGDPN SRPPPAKPQL STAKSRTRLF 
DSCPTITVSP VITIQRPGDG PTGARLLSQD 
AVAQHNCQDL ESLLLFLQKS KKHLTDNEFK 
NTTIPLLLEI ARQTDSLKEL VNASYTDSYY 
LVENGADVQA AAHGDFFKKT KGRPGFYFGE 
NSWQTADISA RDSVGNTVLH ALVEVADNTA 
HPTLKLEELT NKKGMTPLAL AAGTGKIGVL 
TEWAYGPVHS SLYDLSCIDT CEKNSVLEVI 
RLLQDKWDRF VKRIFYFNFL VYCLYMIIFT 
DYFRVTGEIL SVLGGVYFFF RGIQYFLQRR 
LFMLATWLY FSHLKEYVAS MVFSLALGWT 
KMILRDLCRF MFVYIVFLFG FSTAVVTLIE 
CRPPDSSYNS LYSTCLELFK FTIGMGDLEF 
TYILLLNMLI ALMGETVNKI AQESKNIWKL 
FR5GKLLQVG YTPDGKDDYR WCFRVDEVNW 
KRTLSFSLRS SRVSGRKWKN FALVPLLREA 
SLKPEDAEVF KSPAASGEK 



BLAST P hits 



694 



WO 01/12659 



PCT/IB00/01496 



No BLASTP hits available 

Alert BLASTP hits Cor DKFZphtes3_20lc2, frame 2 

TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds., H • 1, 
Score - 3760, P - 0 

TR£MBLNEW:AB015231_1 product: "stretch-inhibitable nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), complete cds., N - 2, Score * 2090, P - 2e-219 



>TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length - 838 

HSPs : 

Score - 3760 (564.1 bits), Expect - 0.0e+00, P - 0.0e+00 
Identities - 721/839 (85%), Positives - 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+++C DP 0 DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTR5RTRLFGKGDSEEASP 60 

Query: 61 VDCPHEEGELDSC PTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 120 

+DCP+EEC L SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRSIF+ 
Sbjct: 61 LDCPYEEGGLASCPIITVSSVLTIQRPGDGPASVRPSSQDSVSAG-EKPPRLYDRRSIFD 119 

Query: 121 AVAQNNCQDLESLLLF«)KSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LT D + E FKDP ETGKTCLLKAMLNLH +GQN TI LLL++ 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDV 179 

Query: 181 ARQTDS LKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDS YYKGQTALH I A I ERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKTDS LKQFVNASYTDSYYKGQTALHIAIERRNMTLVTLLVENGADVQAAANGDFFKKT 239 

Query: 241 KGRPG FY FGEL PLSLAACTNQLG I VK FLLQNS WQTAD I SARDS VGNT VLHALVEV ADNT A 300 

KGRPGFYFGELPLSLAACTNQL I VK FLLQNS WQ AD I SARDS VGNT VLHALVEV ADNT 
Sbjct: 240 KGRPG FY FGEL PLSLAACTNQLA I VK FLLQNS WQPAD I SARDS VGNT VLHALVEVADNTV 299 

Query: 301 dntkfvtsmyneililgaklhptlkleeltnkkgmtplalaagtgkicvlayilqreiqe 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYILQREI E 
Sbjct: 300 DNTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRIFYFNFLVYCLYMI IFTMAAYYRPVDGLPPFKMEK-IGDYFRVTGEI 479 

RLLQDKWDRFVKRIFYFNF VYCLYMIIFT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRIFYFNFFVYCLYMIIFTAAAYYRPVEGLPPYKLKMTVGDY FRVTGEI 479 

Query: 4 80 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSlfSEMLFFLQSLFMLATWLYFSHLKEYVA 539 

LSV GGVY FFFRG IQY FLQRRPS + K + L FV DS Y SE +LFF+QSL FML +VVLYFS KEYVA 
Sbjct: 480 LSVSGGVY FFFRG IQY FLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVA 539 

Query: 54 0 SMVFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAWTLI 599 

SMV FS LA+GWTNMLYYT RG FQQMG I YAVM IEKMI LRDLC RFMFVY+ V FL FG FSTAVVT L I 
Sbjct: 540 SMVFSLAMGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYLVFLFGFSTAVVTLI 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H+ RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACKP-GNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 658 

Query: 660 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 
Sbjct: 659 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 718 

Query: 720 AFRSGKLLQVG YTPDGKDDY RWC FRVDEVNWTTHNTN VG 1 1 N EDPGNCEGVKRTLS FS LR 779 

AFRSGKLLQVG+TPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 
Sbjct: 719 AFRSGKLLQVG FT PDGKDDY RWC FRVDEVNWTTWNTNVG 1 1 NEDFGNCEGVKRTLS FSLR 778 

Query: 780 SSRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEOAEVFKSPAASGEK 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 779 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 
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Report for DKF2phtes3_20k2 . 2 



(LENGTH) 839 

(MW) 94950.75 

[pi} 6.90 

[HOMOL] TREMBL: AF029310_1 product: "vanilloid receptor subtype 1"; Rattus norvegicus 
vanilloid receptor subtype 1 mRNA, complete cds. 0.0 

[FUKCAT) 99 unclassified proteins (S. cerevisiae, YIL112w] 4e-05 

[PIRKW] alternative splicing 3e-06 

{PIRKW] peripheral membrane protein 3e-06 

[SUPFAMJ ankyrin repeat homology 3e-06 

[SUPFAM] unassigned ankyrin repeat proteins 3e-0$ 

( PFAH] Ank repeat 

[KW] TRANSMEMBRANE 4 



SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRO cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDPJ*SIFE 

PRO cccccccccccccccccceeeeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTI PLLLEI 

PRO hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDSLKELVNAS YT DS YY KGQTALH I A I ERRNMALVTLLV ENGA DVQAAAHGD FFKKT 

PRO hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADI SARDS VGNTVLHALVEVADNTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

SEQ DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRH LSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 

SEQ RLLQDKWDRFVKRIFYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEKIGDYFRVTGEIL 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM 

SEQ S V LGGV Y FF FRG I Q Y FLQRRPSMKT L FVDS YS EML FFLQS LFMLATVVL YFSHLKE YVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MV FS LALGWTNML Y YTRG FQQMG I YAVMI EKMI LRDLCRFMFV Y I VFLFGFSTAWTL I E 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMMMMMMMMMM. 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFrENYDFKAVF 

PRD cccccccccccccccccccccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ IILLLAYVILTYIL LLNMLI ALMG ETVNK I AQES KN I WKLQRAIT I LDTEKS FLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGI INEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 



(Ho Prosite data available for DKFZphtes3_20k2 . 2) 
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DKFZphtes3_2013 



group: transmembrane protein 

DKFZphtes3_20i3 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor. 

The novel protein contains one transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



similarity to IL-17 receptor 
Sequenced by HediGenomix 
Locus : unknown 
Insert length: 2406 bp 

Poly A stretch at pos. 2345, no polyadenylation signal found 



1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 
51 AGCCAGCAGA AACAGTGGGC TGTACAACAT CACCTTCAAA TATGACAATT 
101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 
151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 
201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 
251 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 
301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 
351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 
4 01 TTGTCCCTTT TCCTTCCATT AAAAaCGAAA GCAATTACCA CCCTTTCTTC 
451 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 
501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 
551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 
601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 
651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 
701 TTTCTCCAGG GGATTATATA ATTGAGCTGG TGGATGACAC TAACACAACA 
7 51 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 
801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 
851 TCGCGACGCT CTTCACTGTG ATGTGCCGCA AGAAGCAACA AGAAAATATA 
901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 
951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 
1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 
1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 
1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 
1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 
1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 
1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 
1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 
1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 
1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 
14 51 CCCGAGACCA CGGCCTCCAG GAGCCGGGGC AGCACACGCG ACAGGGCAGC 
1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 
1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 
1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 
1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 
1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 

17 51 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 
1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 

18 51 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 
1901 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 
1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 
2001 GTCCTCCTCT TCAGGCCTGG GTGAGGAGGA ACCTCCTGCC CTTCCTTCCA 
2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 
2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACAAAACG AAAGAGTCTA 
2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 
2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 
2251 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 
2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2401 AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 

Ho Medline entry 

Peptide information for frame 1 



ORF from 346 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 

1 MESQPFLNMK FETDY FVKVV PFPSIKNESN YHPFFFRTRA CDLLLQPDNL 

51 ACKPFWKPRN LNISQHGSDM QVSFDHAPHN FGFBFFYLHY KLKHEGPFKR 

101 KTCKQEQTTE MTSCLLQNVS PGDYIIELVD DTNTTRKVMH YALKPVHSPW 

151 AGP IRA VAX T VPLVVISAFA TLFTVMCRKK QQENIYSHLD EESSESSTYT 

201 AALPRERLRP RPKVFLCYSS KDGQNHMNVV QCFAYFLQDF CGCEVALDLW 

251 EDFSLCREGQ REWVIQKIHE SQFIIVVCSK GMKYFVDKKN YKHKGGGRGS 

301 GKGELFLVAV SAIAEKLRQA KQSSSAALSK FIAVYFDYSC EGDVPGILDL 

351 STKYRLMDNL PQLCSHLHSR DHGLQEPGQH TRQGSRRNYF RSKSGRSLYV 

401 AICNMHQFID EEPDWFEKQF VPFHPPPLRY REPVLEKFDS GLVLNDVMCK 

451 PGPESDFCLK VEAAVLGATG PADSQHESQH GGLDQDGEAR PALDGSAALQ 

501 PLLHTVKAGS PSDMPRDSGI YDSSVPSSEL SLPLMEGLST DQTETSSLTE 

551 SVSSSSGLGE EEPPALPSKL LS5GSCKADL GCRSYTDELH AVAPL 



BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_2013, frame 1 

TREMBL:US8917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds., H - 1, Score - 215, P - 4.7e-14 

TREMBL:MM31993_1 product: "interleulcin 17 receptor"; Mus musculus 
interleulcin 17 receptor mRNA, complete cds., N - 2, Score » 152, P - 
l.le-13 

>TREMBL:y5S917_l product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds. 

Length * 866 



Score - 215 (32.3 bits). Expect - 4.7e-14, P - 4.7e-14 
Identities - 85/284 (29%), Positives - 131/294 (46%) 



KV++ YS+ D +++W FA FL CG EVALDL E+ ++ 



- FVDKKN YXXXXXXXXXXXX ELFLVAVS AI AEXXXXXXXXX 324 
+ + +LF A++ I 



++ YF + SC+GDVP + 



Query: 


213 


Sbjct: 


379 


Query: 


269 


Sbjct: 


438 


Query: 


325 


Sbjct: 


498 


Query: 


384 


Sbjct: 


551 


Query: 


435 


Sbjct: 


611 



[RNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQFV PFHPPPLR YREPV 4 34 

NY RS GR L A+ + PDWFE + + PL + EP+ 



+G+V ++PSCL++VGGA H 
GTGIVKRAPLVRE-PGSQACLAIDPLV-GEEGGAAVAKLEPH- 

Pedant information for DKFZphtes3_2013, frame 1 
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2551 ACGCAAGATT GCCAAATTCG AGGAGAAGCA CTTGTCGAGT TTAAGTGCCA 
2601 TTCGAGAGGA GTTGGAACTG CCCAACATTG AGAAGATGAT CCTAGAATGC 
2651 AGTGCTGACA TCAGTGAGTT GTTCGATGCG CTCATGACGC TGGAGATGCA 
2701 GCTGGTGGAG CAGCTGGAGG TAAGGCTGGG CCCTGGGCAC AAGTGCCAGA 
2751 ATCTGGCGAT GCAGCTGCAC ATCCATAGGT GAACTGTAGC CTTCATGGGC 
2801 ACGCCTCTGC TGGAAACGTC CAGCACGACT CAGCGTGGCA GGCTGTAGCT 
2851 TTCTTGCTCA TCAGTCCTGT TTGCTTTTAT TACATTTTAA TCATTTACAT 
2901 TGGAAGTGAT TCTTGTGGAA AATGAGAGGT GAGCTCATTC TTCTGAAATG 
2951 GTCCCCCTAT CCTGGAAGTC AGTGGGGAGA GGTTTTTGAT TAGACCCCTG 
3001 GAGCTATCCG GGTACTCTAA AGGCAAAGCG CACCCCCACT TGGGGACCAA 
3051 ACAAAGACCC CTCCGCATTG CAGCCTGCAG TTGCCGCTTC TCAGGTGACG 
3101 TGAGGAGGCT GCAACTCAGC ACTAAGTAGT GAAAATGAAA AGCGCCGCTG 
3151 TCTGAAATTC ATTAGCAGCC AGAGTATGTG TTACAAGGCA GCGGAGGCTG 
3201 GGAGTCTGAA GTGGTGTGAT GAATTGAACC TCATCGGATG CTGCTGTGGC 
3251 TGGGCCAAGT GATAGCACCT AATCAATTCC TCACACGTCA AGTGACACCT 
3301 CAGACATGGG ATAGATTTCC CCATCACATC ACAGGGCAGG TGCTCCCTCC 
3351 CTGCTGGAGA GCACAGGCAC TGCAGAAGCA GCGCACAGTG CCAGGGGCGA 
3401 GTGAGGCAGC AGCTCCCAGC CTTTTCAGGC ACGGAGATTG CCTTTCAACA 
34 51 TCCAAACATT TCCCAGAACC CATGTGCCAT CCTACTTGTA TTACTGGTGG 
3501 CCAGAAAGCC ACAAGCGCAA TCATGCTTTT CAATGACCCT ATTTTTATTC 
3551 ACGAGAACAG CACATACATG TGTTTGAAAA TTATGTGAGG TGCTCACTCT 
3601 GCAGACAGTA CTCACATTCC TATAGATTCC ACCCCTGCCC ACCTTGCAGC 
3651 CCCTGGAGTC TATAGCAGAT GGGAGTGGGG CACTCCGAGA GTGGCAGGCC 
3701 TGGAGATCAC ATCTTCCATT GTTCCTTCAA TCAACACTAA CTCCCATTTG 
3751 GGCCTTAGGT GCCTTGCTAA GCACCACAAA ACAGCAACTA ACTGAAAGAG 
3801 ATCTGGAGTG CCAGCCCGCT CCTACTGAGG GCCTCCTCTC TGTCAGGCAC 
3851 CTTGCAAAGC ATTTTGTGTG AAGTGACTCA TTTAACCTCA CCACAACGCC 
3901 ACAACGCAGG GATTATGCAG GTAACCTATT TCCCAGATGA GGAAGATAAG 
3951 GCCCAAGGAG GTGAAATGCC TTTCCCAGAG TTACACAGAG TGCTGGAGCT 
4001 GGGAATACTG ACCCAGGCAG TCTAGCTCTT AACAGCTCAC TCCACTGTTT 
4051 CCCTGGAGGT GATGCACAGA TGTCACTGGG AAACCCAAAG GAGAGGGGGT 
4101 TGGCTGTGTG TGTGTGTGTT GGGCAGGCAG GTAAGGGGAG TAAGACCAGG 
4151 ACAAGTGTTC CTGGCAAAGT TCCGGTGACA GCATTAAACA TTCAGATGGT 
4201 GAGGGAGTTA ATATGGTTGG AGAACAACAA CTTTAGAGAG AGCAGAGGGG 
4251 TCAGTTCACA ACCATCTGCT CAGGAGGGTC AAGATGGGTG GTCTTTATGC 
4 301 TGAAGGTCTG TGATTAGAGG AGCTGGTTGC TAAATTTTGA GGAGTACCTT 
4 351 TTGCTCTGTG CTGGACATCT AAATATGCAT GTTAACTGTG TTCTTTAACA 
4 401 TTTCCAGGAG ACTATAAACA TGTTTGAAAG GAACATTGTT GACATGGTAG 
4451 GACTGTTTAT CGAAAATGTC CAAAGCCTAT ATCCTTTCTG TGATGACCTT 
4501 CCCCATGGGG AGGTGCTACA GAGCCCCTGG GCTTGTCCCG GCCTCTGGAC 
4551 AAAAGAATGT TCCACAGGGT CTGAGGAGGT TTCCCGACCC TCAGAACAAT 
4 601 GATGGCCTGG TTAGAGCTGT GGTTTGGATG CCCAGAGGGA CAACATCCAA 
4 651 ACTGTTTGCA GTAGGCTCCC AGCATGATTG TTCTCATATG AGTGATGTTC 
4701 ACTAGGAAAT GACGCCCCCT GTGTTGCAGG CAAGCACACT CTGGGGTTGA 
4751 GGCAACCCCC ACGTGGAAGA CACTATAAGG AGTACATCAG GTGAAATGTT 
4801 AGGGTGAGGA GCCAACATCG GAGCATGGCC AACCCTTCTT CCACCCGAAC 
4851 TCAGGGCACT CCACATGGGG CAAACTGCTG TGCTCCAGCT AGCAGCAGCC 
4 901 CTGTGGTCCT GCCCTCCTGG GGCTCACAGT CCCTCAGGGA GACAAGTTGT 
4 951 AGAGGCAACA AGTGGTGCCA AATGCACAGG GTGAGAAGCA GTTAACCCAG 
5001 AGGCCAGGAG CCTCCATGCA GGAGGGAGAG AAGAGTGTGA TGGCAGGGGC 
5051 CGAGGGTCCG TCCGAGGTGT GGGGCAGGGG CAGGGAGTCG AGGAAGGCCC 
5101 AGGGTTCGGA GCTTGTGAGT GGACGGTGCT GCCAGCCAGA ATTTCCGAGC 
5151 TCGCCTTGGG CCCTTAAAGT CTGTCTCCCG CCGTCTGAGA GCATCAGGGA 
5201 CGCGCCGGGC CTGCTCCTCC CGGGCCTTTG CTTAACTCGG GGCTGCACGA 
5251 TGGCTCAGTG CCGGGACCTG GAGAATCACC ACCACGAGAA GCTCCTGGAG 
5301 ATCTCTATCA GCACCCTGGA GAAGATTGTC GAGGGCGACC TGGACGAGGA 
5351 CCTGCCTAAC GACCTGCGCG CGCTTTTTGT CGATAAAGAT ACGATTGTTA 
5401 ATGCTGTCGG GGCATCGCAC GACATCCACC TCCTGAAGAT TGACAATCGA 
54 51 GAAGATGAGC TGGTGACCAG AATCAACTCT TGGTGTACAC GTTTAATAGA 
5501 CAGGATTCAC AAGGATGAGA TCATGAGGAA CCGCAAGCGC GTGAAGGAGA 
5551 TCAATCAGTA CATCGACCAC ATGCAGAGCG AACTGGACAA CCTGGAATGT 
5601 GGCGACATCC TAGACTAGAT GAATGTCAGC CACAGGAGCT TCTTCAAAAC 
5651 ATAGCACCAG CCCCAGCCAG GAGAAGGAAG TGCACACGCC TCACCCGCAC 
5701 CTCTAGAGAG TTGCTGGGCA TCTCTCAACC GCGATCCCCA ACACCATTCT 
5751 TCCCCCACCC CTGGAAAAAC TTCCAAAAGT AGAGAAAATA AAGGACTCAT 
5801 TTCACAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry HS1292248 from database EMBL: 
human STS SHGC-53917. 
Score = 874, P - 3.3e-33, identities = 180/185 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 876 bp; peptide length: 225 
Category: similarity to known protein 



1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 

51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 

101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRIDNMMNII 

151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 

201 SLSVSQPCET DSSSPQVSWK RGIEE 



Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL : HSSDS22MR_1 gene: "sds22"; 

product: "yeast sds22 homolog"; H. sapiens sds22-like mRNA 

Score » 234, P = 1.2e-19, identities = 61/143, positives = 93/143 

Entry A38439 from database PIR: 

suppressor protein sds22(+) - fission yeast (Schizosaccharomyces pombe) 
>TREMBL:SPSDS22_1 gene: "sds22+"; S. pombe sds22+ gene, complete cds. 
Score = 208, P - 5.6e-17, identities = 52/127, positives = 71/127 

Entry S43988 from database PIR: 

protein suppressor sds22 - fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PP1 REGULATORY SUBUNIT 
SDS22. >TREMBL: SPAC4A8_12 gene: "sds22 M ; product: "phosphatases ppl 
regulatory subunit" ; S. pombe chromosome I cosmid c4A8. 
Score = 208, P = 8.5e-17, identities = 52/127, positives - 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2 . 

Score = 214, P - 3.6e-16, identities = 50/125, positives - 75/125 



BLASTP hits 



Alert BLASTP hits for DKFZphutel_20mll, frame 1 



No Alert BLASTP hits found 



Pedant information for DKFZphutel_20ml 1, frame 1 



Report for DKFZphutel_20mll . 1 



palmitylation, 



[FUNCAT] 
[FUNCAT] 



[HOMOL] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 



[LENGTH] 
[MW] 



[pi) 



225 

25955.87 
4.63 

PIR:S68209 sds22 protein homolog - human le-18 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLl93c] 2e-ll 
30.10 nuclear organization [S. cerevisiae, YKL193c] 2e-ll 
06.07 protein modification {glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YKLl93c] 2e-ll 

30.05 organization of centrosome [S. cerevisiae, YOR373w) 2e-06 

01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 



YJLOOSw] 3e-05 



[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 



( FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[EC] 



03.10 sporulation and germination [S. cerevisiae, YJLOOSw] 3e-05 

30.02 organization of plasma membrane [S. cerevisiae, YJL005w] 3e-05 

10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-05 

04.07 rna transport [S. cerevisiae, YPL169c] 9e-04 

04.05.01.04 transcriptional control [S. cerevisiae, YCR065w] 9e-04 

4.6.1.1 Adenylate cyclase 2e-06 

nucleus 5e-16 

duplication 2e-06 

tandem repeat 2e-06 

cAMP biosynthesis 2e-06 

glycoprotein 2e-06 

phosphorus-oxygen lyase 2e-06 

leucine-rich alpha-2-glycoprotein repeat homology 5e-16 
fibromodulin 3e-07 

yeast adenylate cyclase catalytic domain homology 2e-06 
yeast adenylate cyclase 2e-06 
CK2_PHOSPHO_SITE 2 
PKC PHOSPHO SITE 1 
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[KW] All_Alpha 

SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPQEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNIIEKIEGLENLAHLVWLDLSFNNIETIEGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ ISKIDSLDALVKLQVLSLGNNRIDNMMNIIYLRRFKCLRTLSLSRNPISEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRIDDHTASVSLSVSQPCETDSSSPQVSWKRGIEE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 



Prosite for DKFZphutel_20mll . 1 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 
PS00006 122->126 CK2_PHOSPHO_SITE PDOC00006 
PS00006 169->173 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_20mll . 1) 
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DKFZphutel_20m24 



group: metabolism 

DKFZphutel_20m24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 



strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 .is involved in the assembly of the core oligosaccharide 
Glc3Man9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8954 corresponding genomic DNA (1 exon ) 

Sequenced by AGOWA 

Locus: /map='*ll" 

Insert length: 1986 bp 

Poly A stretch at pos. 1966, polyadenylation signal at pos. 1949 



1 TTCTTTTTTC CCCAGGCTTG CCATGGCTAG TCGAGGGGCT CGGCAGCGCC 
51 TGAAGGGCAG CGGGGCCAGC AGTGGGGATA CGGCCCCGGC TGCGGACAAG 
101 CTGCGGGAGC TGCTGGGCAG CCGAGAGGCG GGCGGCGCGG AGCACCGGAC 
151 CGAGTTATCT GGGAACAAAG CAGGACAAGT CTGGGCACCT GAAGGATCTA 
201 CTGCTTTCAA GTGTCTGCTT TCAGCAAGGT TATGTGCTGC TCTCCTGAGC 
251 AACATCTCTG ACTGTGATGA AACATTCAAC TACTGGGAGC CAACACACTA 
301 CCTCATCTAT GGGGAAGGGT TTCAGACTTG GGAATATTCC CCAGCATATG 
351 CCATTCGCTC CTATGCTTAC CTGTTGCTTC ATGCCTGGCC AGCTGCATTT 
4 01 CATGCAAGAA TTCTACAAAC TAATAAGATT CTTGTGTTTT ACTTTTTGCG 
4 51 ATGTCTTCTG GCTTTTGTGA GCTGTATTTG TGAACTTTAC TTTTACAAGG 
501 CTGTGTGCAA GAAGTTTGGG TTGCACGTGA GTCGAATGAT GCTAGCCTTC 
551 TTGGTTCTCA GCACTGGCAT GTTTTGCTCA TCATCAGCAT TCCTTCCTAG 
601 TAGCTTCTGT ATGTACACTA CGTTGATAGC CATGACTGGA TGGTATATGG 
651 ACAAGACTTC CATTGCTGTG CTGGGAGTAG CAGCTGGGGC TATCTTAGGC 
701 TGGCCATTCA GTGCAGCTCT TGGTTTACCC ATTGCCTTTG ATTTGCTGGT 
751 CATGAAACAC AGGTGGAAGA GTTTCTTTCA TTGGTCGCTG ATGGCCCTCA 
801 TACTATTTCT GGTGCCTGTG GTGGTCATTG ACAGCTACTA TTATGGGAAG 
851 TTGGTGATTG CACCACTCAA CATTGTTTTG TATAATGTCT TTACTCCTCA 
901 TGGACCTGAT CTTTATGGTA CAGAACCCTG GTATTTCTAT TTAATTAATG 
951 GATTTCTGAA TTTCAATGTA GCCTTTGCTT TGGCTCTCCT AGTCCTACCA 
1001 CTGACTTCTC TTATGGAATA CCTGCTGCAG AGATTTCATG TTCAGAATTT 
1051 AGGCCACCCG TATTGGCTTA CCTTGGCTCC AATGTATATT TGGTTTATAA 
1101 TTTTCTTCAT CCAGCCTCAC AAAGAGGAGA GATTTCTTTT CCCTGTGTAT 
1151 CCACTTATAT GTCTCTGTGG CGCTGTGGCT CTCTCTGCAC TTCAGAAATG 
1201 TTACCACTTT GTGTTTCAAC GATATCGCCT GGAGCACTAT ACTGTGACAT 
1251 CGAATTGGCT GGCATTAGGA ACTGTCTTCC TGTTTGGGCT CTTGTCATTT 
1301 TCTCGCTCTG TGGCACTGTT CAGAGGATAT CACGGGCCCC TTGATTTGTA 
1351 TCCAGAATTT TACCGAATTG CTACAGACCC AACCATCCAC ACTGTCCCAG 
1401 AAGGCAGACC TGTGAATGTC TGTGTGGGAA AAGAGTGGTA TCGATTTCCC 
1451 AGCAGCTTCC TTCTTCCTGA CAATTGGCAG CTTCAGTTCA TTCCATCAGA 
1501 GTTCAGAGGT CAGTTACCAA AACCTTTTGC AGAAGGACCT CTGGCCACCC 
1551 GGATTGTTCC TACTGACATG AATGACCAGA ATCTAGAAGA GCCATCCAGA 
1601 TATATTGATA TCAGTAAATG CCATTATTTA GTGGATTTGG ACACCATGAG 
1651 AGAAACACCC CGGGAGCCAA AATATTCATC CAATAAAGAA GAATGGATCA 
1701 GCTTGGCCTA TAGACCATTC CTTGATGCTT CTAGATCTTC AAAGCTGCTG 
1751 CGGGCATTCT ATGTCCCCTT CCTGTCAGAT CAGTATACAG TGTACGTAAA 
1801 CTACACCATC CTCAAACCCC GGAAAGCAAA GCAAATCAGG AAGAAAAGTG 
1851 GAGGTTAGCA ACACACCTGT GGCCCCAAAG GACAACCATC TTGTTAACTA 
1901 TTGATTCCAG TGACCTGACT CCCTGCAAGT CATCGCCTGT AACATTTGTA 
1951 ATAAAGGTCT TCTGACATGA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJl59ol, complete sequence. 
Length = 42,771 

Entry HSB8954 from database EMBL: 
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cSRL-50A3-u cSRL flow sorted Chromosome 11 specific cosmid Homo 
sapiens genomic clone cSRL-50A3. 
Length - 601 



Medline entries 



96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharomyces cerevisiae: 
identification of the ALG9 gene encoding a putative 
mannosyl transferase. 



Peptide information for frame 2 



ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 



1 MASRGARQRL KGSGASSGDT APAADKLREL LGSREAGGAE HRTELSGNKA 

51 GQVWAPEGST AFKCLLSARL CAALLSNISD CDETFNYWEP THYLIYGEGF 

101 QTWEYSPAYA IRSYAYLLLH AWPAAFHARI LQTNKILVFY FLRCLLAFVS 

151 CICELYFYKA VCKKFGLHVS RMMLAFLVLS TGMFCSSSAF LPSSFCMYTT 

201 LIAMTGWYMD KTSIAVLGVA AGAILGWPFS AALGLPIAFD LLVMKHRWKS 

251 FFHWSLMALI LFLVPVVVID SYYYGKLVIA PLNIVLYNVF TPHGPDLYGT 

301 EPWYFYLING FLNFNVAFAL ALLVLPLTSL MEYLLQRFHV QNLGHPYWLT 

351 LAPMYIWFII FFIQPHKEER FLFPVYPLIC LCGAVALSAL QKCYHFVFQR 

401 YRLEHYTVTS NWLALGTVFL FGLLSFSRSV ALFRGYHGPL DLYPEFYRIA 

4 51 TDPTIHTVPE GRPVNVCVGK EWYRFPSSFL LPDNWQLQFI PSEFRGQLPK 

501 PFAEGPLATR IVPTDMNDQN LEEPSRYIDI SKCHYLV0LD TMRETPREPK 

551 YSSNKEEWIS LAYRPFLDAS RSSKLLRAFY VPFLSDQYTV YVNYTILKPR 
601 KAKQIRKKSG G 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_20m24 , frame 2 

SWISS PROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score - 957, P - 2.7e-96 

PIR:S63177 mannosyl transferase {EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N - 1, Score = 533, P = 2.3e-51 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N - 1, Score = 957, P - 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-51 



>SWISSPROT: YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II . 

Length = 653 

HSPs: 

Score = 957 (143.6 bits), Expect = 2.7e-96, P = 2.7e-96 
Identities = 206/514 (40%), Positives = 296/514 (57%) 

Query: 48 NKAGQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSP 107 

N W + FK LLS R+ A+ I+DCDE +NYWEP H +YGEGFQTWEYSP 

Sbjct: 43 NNPDNDWPFSFGSVFKMLLSIRISGAIWGIINDCDEVYNYWEPLHLFLYGEGFQTWEYSP 102 

Query: 108 AYAIRSYAYLLLHAWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGL 167 

YAIRSY Y+ LH PA+ A + KI+VF +R + + E Y + A+CKK + 

Sbjct: 103 VYAIRSYFYI YLHYI PASLFANLFGDTKI VVFTLIRLTIGLFCLLGEYYAFDAICKKINI 162 

Query: 168 HVSRMMLAFLVLSTGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGW 227 

R + F + S+GMF +S+AF+PSSFCM T + + + + + VA ++GW 
Sbjct: 163 ATGRFFILFSIFSSGMFLASTAFVPSSFCMAITFYILGAYLNENWTAGI FCVAFSTMVGW 222 

Query: 228 PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLY 287 
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PFSA LGLPI D+L++K F SL+ + V+ DS+Y+GK V+APLNI LY 

Sbjct: 223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

Query: 288 NVFTPHGPDLYGTEPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 347 

NV + GP LYG EP FY+ N F N+N+ A PL+ + Y + + Q+ 
Sbjct: 283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNIVIFAAPFGFPLS--LAYFTKVWMSQDRNVAL 340 

Query: 348 WLTLAPMYI WFI I FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 400 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 
Sbjct: 341 YQRFAPI ILLAVTTAAWLLI FGSQAHKEERFLFPI YPFI AFFAALALDATNR LCLKK 397 

Query: 401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPE 4 60 

++ N L++ + F +LS SR+ ++ Y +++Y T+ T + 

Sbjct: 398 LGMD NILSILFILCFAILSASRTYSIHNNYGSHVEIYRSLNAELTNRT-NFKNF 450 

Query: 4 61 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFIPSEFRGQLPKPFAEGPL ATRI 511 

P+ VCVGKEW+RFPSSF +P +++FI SEFRG LPKPF + TR 

Sbjct: 4 51 HDPIRVCVGKEWHRFPSSFFIPQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

Query: 512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
Sbjct: 511 I PTEMNNLNQEEI SRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQN Y 556 



Pedant information for DKFZphutel_20m24 , frame 2 



Report for DKFZphutel_20m24 . 2 



[LENGTH J 611 

(MWJ 69863.78 

tpl] 8.91 

(HOMOLJ SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2e- 
93 

I FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YNL219c] 4e-69 

[FUNCAT] 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YNL219c) 
4e-69 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YNL219c] 4e-69 

[PIRKW] glycosyltransferase 9e-68 

[PIRKW] transmembrane protein 9e-68 

[ PIRKW] hexosyltransferase 9e-68 

(PROSITE] MYRISTYL 9 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] PKC_PHOSPHO_SITE 6 

[PROSITE] ASN_GLYCOS YLATI ON 2 

[KW] TRANSMEMBRANE 7 

[KW] LOW_COMPLEXITY 6.71 % 



SEQ MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 

SEG 

PRD ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccccccch 

MEM MMMMMM 

SEQ AFKCLLSARLCAALLSNISDCDET FNYWEPTHYLI YGEGFQTWEYSPAYAIRSYAYLLLH 

SEG . . . xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhccccceeeccccceeeeeccccceeecccchhhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ AWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGLHVSRMMLAFLVLS 

SEG 

PRD cchhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGWPFSAALGLPIAFD 

SEG « .xxxxxxxxxxxxx 

PRD cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 

MEM MMMMMMMMMMMMMM 

SEQ LLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLYNVFTPHGPDLYGT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 

MEM MMMMMMM . MMMMMMMMMMMMMMMMMMMMM 

SEQ EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 

SEG xxxxxxxxxxxxxxx 

PRD cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 
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SEQ FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWLALGTVFL 



PRD hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

SEQ FGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

SEG 

PRD eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

MEM 

SEQ LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 

SEG 

PRD ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

MEM 

SEQ TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 

SEG 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

MEM 

SEQ KAKQIRKKSGG 

SEG 

PRD hhhhhhccccc 

MEM 



Prosite for DKFZphutel_20m24 . 2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00O06 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00O08 
PS00008 
PS00O08 
PS00008 



77->81 
593->597 
606->610 
67->70 
133->136 
541->544 
54S->548 
553->556 
572->575 
16->20 
79->83 
329->333 
457->461 
541->545 
545->549 
553->557 
12->18 
14->20 
32->38 
47->53 
166->172 
182->188 
218->224 
222->228 
234->240 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_STTE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC0O008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKF2phutel_20m24 . 2) 
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DKFZphutel_21dl5 



group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

Sequenced by MediGenomix 
Locus: /chromosome* 1 " 3" 
Insert length: 5292 bp 

Poly A stretch at pos. 5273, polyadenylation signal at pos. 5252 



1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 
51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 
101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 
151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 
201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 
251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 
301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 
351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 
401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 
451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 
501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 
551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 
601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 
651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 
701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 
751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 
801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 
851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 
901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 
951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 
1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 
1051 CAGCCAGATC CTGCCTACTG AAGAGTATGA AGAGGCAATG AGCACTATGC 
1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 
1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 
1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 
1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTCC TATCCCCGTG 
1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 
1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 
1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 
14 51 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 
1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 
1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 
1601 GAGGCCAGCT AGAGCTTGGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 
1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 
1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 
1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 
1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 
1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 
1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 
1951 GTGAAGAATC TGGCCGTGAC CTGGCTGCCA CACTGCTATA GGCCCCAGAA 
2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 
2051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 
2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 
2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 
2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 
2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 
2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 
2351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 
2401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 
2451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 
2501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 
2551 GCTCTGGGTC TTCCTGTAAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 
2601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 
2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 
2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 
2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 
2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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2851 AACAGCCACC ATACCTGGCT CTACCAGGGT GAGGGTGCCC ACCACATCAT 
2901 GCGTGCCATC CGCCAGAGGT GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
2951 GGGGAGAAGA CTGGGCAGGG CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
3001 AGGACAGAAT GGATTAACCC ATTTGGGATT AAGTTCCATT TGTTAGACCA 
3051 GGATTGGGAC CCACTGAAAG ACAGGCAATT AACAAAGGCA AATTAGCCCT 
3101 CCTTGCAGGC ACACAATGGG CAACTGGGGT TAGATAGAGA TTGAGCACTT 
3151 CTTTCTGATT AGATAAATGA CCTCTTATCT TTGACCCCTT ATCTGACCCC 
3201 GTCACAGCAG GAAAAGGGTT TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
3251 GGACCTCAGG ACTCCCCGCC CCCTTTATTT AGTGGAAATG TCAACATTTC 
3301 CACATAGCAG GTGTCTCTGT CTTTGGCATC TGAGGGAGAA GGATCATCAT 
3351 GAGTAACCCC CTCCTGCTCT TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
3401 CTTCCAGGGG AGGTGGGTAG GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
34 51 CCTTGGCCAG CTCCTTCAGA TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
3501 ATGCCTGCTG CCCACCAGGG TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
3551 TCGTGGAGCT CAGCGAGCCG CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
3601 CACTACCATG CCCACGTGGA CAGTGGGCCT GTGTACCCAG AGACCATCTG 
3651 CTCCCATACC AAGCTGGTAG CCAACGAGTC TGTACCCTTC GAGACCTCCT 
3701 GCCGGCAAGT ATCTCCCAAC TGGGGGCTGC CTTCAATCCT CAGACCAGGA 
3751 ACACCCATGA CACAGGCACA GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
3801 GGGGCCAGGA GATCACTGGG TTATCCCGGT TAGTGATGCC CTCACCTCTC 
3851 CCCACAAGTT GTTTACCCAA TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
3901 TGACCACTGG AGTCAACACA GACTGATGTA CCCACAGACA CCAAAACTTG 
3951 CCCCCTGAGT TCTGAAGCAA GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
4001 CCCATTCCTC CAGGTGTTGA TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
4051 TGCCTCCCTC CCCTGTCAAG CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
4101 GGCCCAGCCC CTTCCCATCC CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
4151 TCTGCTAGCC TACCTTTCCC TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
4201 TAACTAAGTG CACCTGTGAT CTTGGCCAAA AAACCATTGC AACTCACAGT 
4 251 AAGAGACTGG GTTTCGGGGA AGGAGGGGCT AGGGACATTT TGGCACTGGC 
4 301 CTGCCCTATT GTCTCCCATC CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
4 351 CCTGGGCAGC TTATCCTGCC CACAGGTAAG CCCCTGGGAG CATCCACAAC 
4 401 TGGGGACCTG CTCAGTGCCC CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
4 451 TTTTATTTGA ACAACGTCAC TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
4 501 AGATAACAGA ACCTACGATG AAATGGTAAG GGTCAACTGG GCTATTACTC 
4551 TTGTGGGCTG GCAGGGGCTT AGACAAGTGA AGTACACACC TCTCCAGGTC 
4 601 TAAGGATGTG GGCCCAAATT ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
4 651 TTGGTCACCC TTGGCTGGCC TGGCCATAGA GTGGGGACAG GTTGAACACC 
4 701 CCACCACCCT GCTGCCCACA GAGTCTGATT CAGGATGACG TGGACCTCCG 
4751 TGACACACGG AGGCACTGTG ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
4801 AGGGCACAGC AGTCTTCTGG TACAACTACC TGCCTGATGG GCAAGGTTGG 
4851 GTGGGTGACG TAGACGACTA CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
4 901 CGGCACCAAG TGGATTGCCA ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
4 951 CGCGGCAAGC GCTGTTCCAA CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
5001 GGCACCGACT CACAGCCCGA GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
5051 GCGCGTGGAA CTCTGAGGGA AGAGTTAGCC CCGGTTCCCA GCCGCGGGTC 
5101 GCCAGTTGCC CAAGATCAGG GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
5151 CTAAAGGTCT GGCCAATGTC TTGCCCCACC CCGCCAGCCG CGATACGGCG 
5201 CAGTTCCTAT ATTCATGTTA TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
5251 CAAATAAAAA ACCACAAGGT TCGAAAAAAA AAAAAAAAAA GG 



BLAST Results 



Entry HSU64252 from database EMBL: 
Human STS sequence NOTI-225. 

Score - 959, P * 1.2e-36, identities = 195/199 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LLSASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPQAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 1 
No Alert BLASTP hits found 



Peptide information for frame 2 



ORF from 320 bp to 892 bp; peptide length: 191 
Category: putative protein 
Classification: no clue 

1 MAAAAVTGQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 
51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL FVHYSNGDES SDPGPQHRAQ 
101 GPGPEPTLGP LTRLEGIKVR TSLPRRAPGP ARLSPRGPAL SPGPHAAPGA 
151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N *■ 2, 
Score = 106, P = 0.0067 



>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1,298 

HSPs: 

Score = 106 (15.9 bits), Expect - 6.7e-03, Sum P(2) - 6.7e-03 
Identities = 36/103 (34%), Positives = 44/103 (42%) 

Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRGPALSPGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

Sbjct: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLS*GPGRLGPFWAARSGAPAARCAP 18 9 

AAP AA ++R P+ GP LG W + P+ AP 

Sbjct: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score = 40 (6.0 bits), Expect = 6.7e-03, Sum P(2) - 6.7e-03 
Identities - 8/21 (38%), Positives - 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 
Sbjct: 212 DHAREARAVGRGPSSAAPAAP 232 



Pedant information for DKFZphutel_21dl5, frame 1 



Report for DKFZphutel_21dl5 . 1 



[LENGTH] 117 

[MW] 11797.32 

[pi] 10.68 

[KW] Irregular 

{KW] SIGNAL_PEPTIDE 22 

[KW] LOW_COMPLEXITY 38.46 % 



SEQ LPLVYALMVPLLSASTLGTLASDLESVQLC PTATQLGKRSPSVGWGSRRRKAEPGADAGG 

SEG xxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQHPQAPSPSDRGARGPGGRC PGDCAARAPPRPLPWARARPGCHGGSGGDRPAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphutel_21dl5 . 1) 
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(No Pfam data available for DKFZphutel_21dl5 . 1) 



Pedant information for DKFZphutel_21dl5, frame 2 



Report for DKFZphutel_21dl5 . 2 

[LENGTH] 191 

[MWJ 19916.88 

[pi] 10.43 

[KW] TRANSMEMBRANE 1 

[KWJ LOW_COMPLEXITY 29.84 % 

SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

SEG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 

MEM 

SEQ FLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . xxxx 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 

MEM 

SEQ GAPAARCAPFP 

SEG xxxxxxxxx . . 

PRD ccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_21dl5 .2) 
(No Pfam data available for DKFZphutel_21dl5 . 2) 
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DKFZphutel_22d2 



group: signal transduction 

DKFZphutel_22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and YAL048c 



Sequenced by BMFZ 
Locus : /map= "17" 
Insert length: 3247 bp 

Poly A stretch at pos. 3230, no polyadenylation signal found 



1 CTCCTGGTGA GAGGAGTCCA CTCCGTGCGT GCGGGCGGAG GCCGGCCCCC 
51 GAGAGCCGCC GACATGAAGA AAGACGTGCG GATCCTGCTG GTGGGAGAAC 
101 CTAGAGTTGG GAAGACATCA CTGATTATGT CTCTGGTCAG TGAAGAATTT 
151 CCAGAAGAGG TTCCTCCCCG GGCAGAAGAA ATCACCATTC CAGCTGATGT 
201 CACCCCAGAG AGAGTTCCAA CACACATTGT AGATTACTCA GAAGCAGAAC 
251 AGAGTGATGA ACAACTTCAT CAAGAAATAT CTCAGGCTAA TGTCATCTGT 
301 ATAGTGTATG CCGTTAACAA CAAGCATTCT ATTGATAAGG TAACAAGTCG 
351 ATGGATTCCT CTCATAAATG AAAGAACAGA CAAAGACAGC AGGCTGCCTT 
401 TAATATTGGT TGGGAACAAA TCTGATCTGG TGGAATATAG TAGTATGGAG 
451 ACCATCCTTC CTATTATGAA CCAGTATACA GAAATAGAAA CCTGTGTGGA 
501 GTGTTCAGCG AAAAACCTGA AGAACATATC AGAGCTCTTT TATTACGCAC 
551 AGAAAGCTGT TCTTCATCCT ACAGGGCCCC TGTACTGCCC AGAGGAGAAG 
601 GAGATGAAAC CAGCTTGTAT AAAAGCCCTT ACTCGTATAT TTAAAATATC 
651 TGATCAAGAT AATGATGGTA CTCTCAATGA TGCTGAACTC AACTTCTTTC 
701 AGAGGATTTG TTTCAACACT CCATTAGCTC CTCAAGCTCT GGAGGATGTC 
751 AAGAATGTAG TCAGAAAACA TATAAGTGAT GGTGTGGCTG ACAGTGGGTT 
801 GACCCTGAAA GGTTTTCTCT TTTTACACAC ACTTTTTATC CAGAGAGGGA 
851 GACACGAAAC TACTTGGACT GTGCTTCGAC GATTTGGTTA TGATGATGAC 
901 CTGGATTTGA CACCTGAATA TTTGTTCCCC CTGCTGAAAA TACCTCCTGA 
951 TTGCACTACT GAATTAAATC ATCATGCATA TTTATTTCTC CAAAGCACCT 
1001 TTGACAAGCA TGATTTGGAT AGAGACTGTG CTTTGTCACC TGATGAGCTT 
1051 AAAGATTTAT TTAAAGTTTT CCCTTACATA CCTTGGGGGC CAGATGTGAA 
1101 TAACACAGTT TGTACCAATG AAAGAGGCTG GATAACCTAC CAGGGATTCC 
1151 TTTCCCAGTG GACGCTCACG ACTTATTTAG ATGTACAGCG GTGCCTGGAA 
1201 TATTTGGGCT ATCTAGGCTA TTCAATATTG ACTGAGCAAG AGTCTCAAGC 
1251 TTCAGCTGTT ACAGTGACAA GAGATAAAAA GATAGACCTG CAGAAAAAAC 
1301 AAACTCAAAG AAATGTGTTC AGATGTAATG TAATTGGAGT GAAAAACTGT 
1351 GGGAAAAGTG GAGTTCTTCA GGCTCTTCTT GGAAGAAACT TAATGAGGCA 
14 01 GAAGAAAATT CGTGAAGATC ATAAATCCTA CTATGCGATT AACACTGTTT 
14 51 ATGTATATGG ACAAGAGAAA TACTTGTTGT TGCATGATAT CTCAGAATCG 
1501 GAATTTCTAA CTGAAGCTGA AATCATTTGT GATGTTGTAT GCCTGGTATA 
1551 TGATGTCAGC AATCCCAAAT CCTTTGAATA CTGTGCCAGG ATTTTTAAGC 
1601 AACACTTTAT GGACAGCAGA ATACCTTGCT TAATCGTAGC TGCAAAGTCA 
1651 GACCTGCATG AAGTTAAACA AGAATACAGT ATTTCACCTA CTGATTTCTG 
1701 CAGGAAACAC AAAATGCCTC CACCACAAGC CTTCACTTGC AATACTGCTG 
1751 ATGCCCCCAG TAAGGATATC TTTGTTAAAT TGACAACAAT GGCCATGTAT 
1801 CCGTAAGTAC TTGCTGTCTT CATTTTCATG TTGCATGGTT CATAACATTG 
1851 CATGCCATTA TTAGCCATGA AGGGAATATC TTTGTCACAT AGGAATTGTT 
1901 CAGCAACAGA AAGATACTTT GTAATGAGAA GGTACAAATT TGAGTAAATG 
1951 CAAGTTTGGT TTGAATGCCA TAATAAAATG ATATAAACAG TGCTTCTGAC 
2001 AATATCTGTA TATTTTTGAG CAGGCTGTAA CTATCTTAAT AGAATAGTAC 
2051 AATAAAACAC AACCCCCCAC CCAGCATTAA AAAATAGTTT TACTGGAATA 
2101 AAATGGGTTT GGCATCATGT TGTTTTATGC TTATAAAGCA TTTTCATATG 
2151 AACAGAAAGT TTATATTTTT CTGTTTTTGA CCTTAGGTAT ATGAAGTTTT 
2201 CTAAAATATT TTATTAATTT ATGTTGAAAT TGTGGGTATG CTTCAGTTAG 
2251 GATATGTCTT TTTTAAGTGC TGTAAAGAGT AGTTGTAATT GGAATTTCTA 
2301 CTGTATAAAT GTTTTACATT AAGTGTTACG AGCCACAAAT TTCATGTACA 
2351 TTTATTATAT ATCTATACAT GCATATGCAC AAGCACATAA CTGTGGTCAT 
2401 CTCTGTAGTT TACTAACTGC CTTAAAATTG CATGGTTCTT AATGGCATTC 
2451 GCCTCAAGTA GTGTGTTTGT ATAAATTCTG TTTTGTAACA AAATAGTTTT 
2501 TCAGGCAGTG CGTTTCTCAG GACTTTATAG CTTATTCTAC TTATTCTTAT 
2551 GTTAGTCTCT AAATTATTTT TCTTCTTATG AAAACTACAG TGTAACACAG 
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2601 AGTAATAATC AAACATTGCT ATAAACCAAG AATGACATTT TTCAAAAAGG 
2651 TGTTGATTTG TACAGATTTT TAAAGTCAGT TAACTTTACT GCTATTTTAT 
2701 TACCTAATAC TTTTTTTAGA TGCAACAAAC CCTTGAATTT CTATTTGTAT 
2751 TCGAAGACAA GTCATTCCTA TTATTATAGA ATAACCAAAA CCTTATTTAT 
2801 GTTTTACCTT TGCTTTAAAA CTCTCATGTA TGTTATCTAC AGAGAGGATC 
2851 ATTACAGAGA CAGACTCTCC CGAGACATGG GCCACACTGA TAGAATAGAG 
2901 AATTTGAGAA AAATCTGGGT CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 
2951 TTCTTTATGA CTTCTGTGGG ATTTTGTTGA TATTTTCTTA GAGAATGACC 
3001 AAATCTCCTT TCTTGCCATA ATTAACATTT AGTAATTATG TAGAAACGCA 
3051 CTGCTTGGTC AGGCTTCCTG CCTAGCTATA TATTACGTTG TCTTCCTTAC 
3101 TACATAAATG TACTTCTTTA ATCTTGTGAT TACAGTAACT GCAAGTGTGT 
3151 TTTTACATCT GCATTTTTAA AACATTTTAC TGTAATTCTG TTGTGTGTGT 
3201 GTGTGTTATA TGATAAATGT ACATACATGG AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS ***, NF1 -related locus, Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities - 387/396 

Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score = 1826, P = 7.5e-78, identities = 388/406 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 64 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 



1 MKKDVRILLV GEPRVGKTSL 
51 VPTHIVDYSE AEQSDEQLHQ 
101 INERTDKDSR LPLILVGNKS 
151 NLKNISELFY YAQKAVLHPT 
201 DGTLNDAELN FFQRICFNTP 
251 FLFLHTLFIQ RGRHETTWTV 
301 LNHHAYLFLQ STFDKHDLDR 
351 TNERGWITYQ GFLSQWTLTT 
401 VTRDKKIDLQ KKQTQRNVFR 
451 EDHKSYYAIN TVYVYGQEKY 
501 PKSFEYCARI FKQHFMDSRI 
551 MPPPQAFTCN TADAPSKDIF 



IMSLVSEEFP EEVPPRAEEI TIPADVTPER 
EISQANVICI VYAVNNKHSI DKVTSRWIPL 
DLVEYSSMET ILPIMNQYTE IETCVECSAK 
GPLYCPEEKE MKPACIKALT RIFKISDQDN 
LAPQALEDVK NVVRKHISDG VADSGLTLKG 
LRRFGYDDDL DLTPEYLFPL LKIPPDCTTE 
DCALSPDELK DLFKVFPYIP WGPDVNNTVC 
YLDVQRCLEY LGYLGYSILT EQESQASAVT 
CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 
LLLHDISESE FLTEAEIICD VVCLVYDVSN 
PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 
VKLTTMAMYP 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22d2, frame 1 

TREMBL:CEUK08F11_3 gene: "K08F11.5''; Caenorhabditis elegans cosmid 
K08F11., N - 1, Score = 1357, P =■ l.le-138 

TREMBL:SPCC320_4 gene: "SPCC320 . 04c" ; product: "hypothetical protein" 

S.pombe chromosome III cosmid c320., N = 1, Score = 889, P - 4.4e-89 

TREMBL:CEUC47C12_3 gene: "C47C12.4"; Caenorhabditis elegans cosmid 
C47C12-, N - 2, Score - 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048c - yeast {Saccharomyces 
cerevisiae), N - 1, Score - 677, P - 1.3e-66 



>TREMBL:CEUK08F11_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11. 

Length =62 5 

HSPs: 
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Score « 1357 (203.6 bits), Expect - l.le-138, P = l.le-138 
Identities = 263/582 (45%), Positives = 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct: 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLSIKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWI PLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QANVIC+VY+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICVVYSVTDESTVDGIQTKWLPLIRQS FGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A-NNTDKILPIMEANTEVETCVECSARTMKNVSEI FYYAQKAVIYPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDGVAD 243 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V DGVA+ 
Sbjct: 188 RARKALI RVFKICDRDNDGYLSDTELNDFQKLCFGI PLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTELNH 303 

L L GFL+LH LFI+RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL++LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKI DLQKKQTQRNVF 419 

+ W +TT +++ + EL YLG+ + +A ++ VTR++K DL+ T R VF 

Sbjct: 368 AYWNMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSIRVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDI 476 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL ++ 
Sbjct: 428 QCLVVGAKDAGKTVFMQSLAGRGMADVAQIGRRH-SPFVINRVRVKEESKYLLLREVDVL 486 

Query: 477 SESEFLTEAEIICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQ 536 

S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 
Sbjct: 487 SPQDALGSGETSADVVAFLYDISNPDSFAFCATVYQKYFYRTKTPCVMIATKVEREEVDQ 54 6 

Query: 537 EYSISPTDFCRKHKMPPPQAFTCNTADAPSKDI FVKLTTMAMYP 580 

+ + P +FCR+ ++P P F+ S IF +L MA+YP 

Sbjct: 547 RWEVPPEEFCRQFELPKPIKFSTGNIGQSSSPIFEQLAMMAVYP 590 



Pedant information for DKFZphutel_22d2, frame 1 



Report for DKFZphutel_22d2 . 1 



[LENGTH J 
tMW] 
tpH 
[ HOMOL ] 
149 

[ FUNCAT] 
[FUNCAT) 
3e-ll 
[FUNCAT] 
cerevisiae 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
8e-09 
[FUNCAT] 
8e-09 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ) 
[FUNCAT] 
9e-08 
[FUNCAT] 
YFLOOSw] 9e-08 
[FUNCAT] 
[ FUNCAT ] 



580 

66541.61 
5.56 

TREMBL:CEUK08F11 



3 gene: "K08F11.5"; Caenorhabditis elegans cosmid K08F11. le- 



99 unclassified proteins [S. cerevisiae, YAL048c] 5e-81 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR055w] 



03.99 other cell growth, cell division and dna synthesis activities 
YNL098c] 8e-09 

10.04.07 g-proteins [S. cerevisiae, YNL098c] 8e-09 
03.10 sporulation and germination [S. cerevisiae, YNL098c] 8e-09 
11.01 stress response [S. cerevisiae, YNL098c] 8e-09 
03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 8e-09 
01.03.13 regulation of nucleotide metabolism [S. cerevisiae, 



[S. 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, 



YNL098C] 
YNL098C] 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 4e-08 
11.10 cell death [S. cerevisiae, YORlOlw] 4e-08 

10.02.07 g-proteins [S. cerevisiae, YPRl65w] 7e-08 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-08 
30.08 organization of golgi [S. cerevisiae, YPRl65w] 7e-08 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YFLOOSw] 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 9e-08 

08.13 vacuolar transport [S. cerevisiae, YNL093w] le-07 
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[ FUNCAT] 


06>04 protein targeting, sorting and translocation [S. cerevisiae, YNL093w] 


le-07 




[ FUNCAT ] 


08. iy cellular import is. cecevisiae, iHLiuyjwj ie-u / 


[ FUNCAT 1 


10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-07 


[ FUNCAT] 


03.07 pherontone response/ mating— type determination, sex - specific proteins 


[S. 


cerevisiae, YLR229c] 8e-07 


t FUNCAT] 


10.99 other signal-transduction activities [S. cerevisiae, YCR027c] 3e-06 


(FUNCAT] 


09.09 biogenesis of intracellular transport vesicles [S. cerevisiae. 


YGL210w] 9e 


-04 


[BLOCKS] 


BL00410A Dynamin family proteins 


( SCOP] 


dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-42 


I SCOP) 


dlguaa_ 3.25.1.3.10 RaplA [Human (Homo sapiens) 5e-59 


[ PI RKW ] 


transmembrane protein le-79 


(PIRKW] 


membrane trafficking 2e-06 


I PI RKW] 


acetylated amino end 3e-09 


( PIRKW] 


prenylated cysteine 3e-09 


[PIRKW] 


signal transduction le-07 


[ PIRKW] 


transforming protein 3e-09 


[PIRKW] 


immediate-early protein 8e-06 


[PIRKW] 


alternative splicing 4e-08 


[PIRKW] 


P-loop le-10 


[PIRKW] 


lipoprotein 7e-10 


[PIRKW] 


proto-oncogene 3e-09 


(PIRKW) 


methylated carboxyl end 3e-09 


[ PIRKW] 


membrane protein 3e-09 


[ PIRKW] 


GTP binding le-10 


[PIRKW] 


thiolester bond 7e-10 


[SUPFAM] 


ras transforming protein le-10 


[PROSITEJ 


ATP GTP A 2 


[PROSITE] 


MYRISTYL 3 


[PROSITE] 


EF HAND 1 


[PROSITE] 


CAMP PHOSPHO SITE 1 


[PROSITE] 


CK2 PHOSPHO SITE 14 


(PROSITE) 


TYR PHOSPHO SITE 4 


[PROSITE] 


PKC PHOSPHO SITE 5 


[PROSITE] 


A S N_GL YCOS YLAT I ON 3 


[PFAM] 


Ras family (contains ATP/GTP binding P-loop) 


(KW) 


Irregular 


(KW) 


3D 



SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSE 

ljai- . . . EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

ljai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKE 

ljai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDG 

ljai- 

SEQ VADSGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTE 

ljai- 

SEQ LNHHAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQ 

ljai- 

SEQ GFLSQWTLTTYLDVQRCLEYLGYLGYSILTEQESQASAVTVTRDKKIDLQKKQTQRNVFR 

ljai- 

SEQ CNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDISESE 

ljai- 

SEQ FLTEAEIICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQEYSI 

ljai- 

SEQ SPTDFCRKHKMPPPQAFTCNTADAPSKDI FVKLTTMAMYP 

ljai- 



Prosite for DKF2phutel_22d2 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



118->122 
154->158 
346->350 
411->415 
94->97 
105->108 



ASN_GLYCOS YLAT ION 
AS N_GL YCOS YLAT I ON 
AS N_GL YCOS YLAT I ON 
CAMP_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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PS00005 


148- 


>151 


PKC PHOSPHO_ 


SITE 


d v\c\c fi fi n n ^ 


off a r% a c 


247- 


•>250 


PKC PHOSPHO_ 


SITE 




PSQQQQ5 


414- 


>417 


PKC_PHOSPHQ_ 


SITE 




PS00006 


59->63 


CK2_PHOSPHQ_ 


SITE 


onnrnflfific 

rUULUUUU D 


PS00006 


105- 


>109 


CK2 PHOSPHQ_ 


SITE 


ruu^uuuDD 


PS00006 


126- 


>130 


CKi^PHOS PnO_ 


SITE 


d nrv n n n o a 


nr* r\ r\ r\ n ^ 

PS00006 


139- 


>143 


CK2_PHOSPHO_ 


_SITE 


rUULUUUUD 


PS00006 


143- 


>147 


CK2_PH0S PH0_ 


_SITE 


d nnr rt fi ft n c 
rUUtUUUUD 


PS00006 


196- 


>200 


CK2_PH0SPH0_ 


SITE 


rUULUUUUO 


PS00006 


203- 


>207 


CK2_PH0SPH0_ 


SITE 


rUUOUUUUb 


PS00006 


311- 


>315 


CK2_PH0SPH0_ 


SITE 


PDOCUOOOb 


PS00006 


325- 


>329 


CK2_PHOSPHO_ 


SITE 


PDOCUUOOo 


PS00006 


370- 


>374 


CK2_PH0SPH0_ 


SITE 


PDOCQOQQo 


PS00006 


390- 


>394 


/^*t/*> nunc T") l_T /"\ 

CK2 PHOSPHO 


SITE 




PS00006 


477- 


>481 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


541- 


>545 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00007 


153- 


>161 


TYR PHOSPHO^ 


'site 


PDOC00007 


PS00007 


376- 


>384 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


153- 


>162 


TYR PHOSPHO" 


"SITE 


PDOC00007 


PS00007 


448- 


>457 


TYR PHOSPHO" 


"SITE 


PDOC00007 


PS00008 


240- 


>246 


MYRISTYL 




PDOC00008 


PS00008 


425- 


>431 


MYRISTYL 




PDOC00008 


PS00008 


433- 


>439 


MYRISTYL 




PDOC00008 


PS00017 


11 


->19 


ATP GTP A 




PDOC00017 


PS00017 


425- 


>433 


ATP GTP A 




PDOC00017 


PS00018 


197- 


>210 


EF HAND 




PDOC00018 



Pfam for DKFZphutel_22d2 . 1 



HMMJJAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
++L+G+ VGK++L ++ EF+EE +P ++ T ++ +++ 
6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENI r . NWweEI r 
ID E+ + + + +A+++ +VY+++N+ S ++++ +W++ 1+ 
53 THIVDYSEAEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLIN 102 

RHCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKT 
+ D+D+ P +LVGNK+DL + ++T + +E+SAK+ 

103 ERTDKDSRLPLILVGNKSDLVEYSSMETILPIMNQYTEI-ETCVECSAKN 151 



NiNVEEAFMEIvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 
N+ E F+ + +++L + +++ +++++ + C+ 

152 LKNISELFYYAQKAVLHPT GPLYCPEEKEMK-PACI — 



186 



515 



WO 01/12659 



PCT/IB00/01496 



DKFZphutel_22el2 



group: signal transduction 

DKFZphutel_22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegans, 
Drosophila and mammalian proteins. 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathway 
involving hte EGF-receptor . 

The new protein can find application in modulating the cornichon modulated signal transduction 
way and also the EGF receptor signaling processes. 

strong similarity to S.cerevisiae YGL054c and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 519 bp 

Poly A stretch at pos . 499, no polyadenylation signal found 



1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 

51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 

101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 

151 GATTACATTA ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 

201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 

251 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 

301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 

351 TTGAAGTCAG CCTACACTAC AGTGCACAGT TGAGGAGCCA GAGACTTCTT 

401 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 

451 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 

501 TGAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 



Peptide information for frame 1 



ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 



1 MEAVVFVFSL LDCCALIFLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTIVTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

8LASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22el2, frame 1 

PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 185, P = 5.7e-17 

TREMBL:SPAC2C4_5 gene: "SPAC2C4 .05"; product: "cornichon homolog"; 
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S.pombe chromosome I cosmid c2C4., N = 1, Score - 163, P - 3.7e-12 

PIR:S46084 probable membrane protein YBR210w - yeast (Saccharomyces 
cerevisiae), N - 1, Score - 162, P - 4.8e-12 

TREMBL:AF104398_1 product: "cornichon"; Homo sapiens cornichon mRNA, 
complete cds., N » 1, Score - 141, P - 8e-10 

SWISS PROT : CNI_DROVT CORNICHON PROTEIN . , N - 1, Score = 139, P = 1.3e-09 

>PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) 

Length - 138 

HSPs: 

Score = 185 (27.8 bits), Expect - 5.7e-17, Sum P(2) - 5.7e-17 
Identities « 35/85 (41%), Positives = 56/85 (65%) 

Query: 1 MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 60 

M A +F+ +++ C +F V+F I +DLE DYIN CSK+NK + PE H +++ 
Sbjct: 1 MGAWLFILAWVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFIFLLNLPVATWNIYRM 85 

L L++ +WF+FLLNLPV +N+ ++ 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score = 37 (5.6 bits), Expect - 5.7e-17, Sum P(2) = 5.7e-17 
Identities = 7/9 (77%), Positives - 9/9 (100%) 

Query: 82 IYRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 



Pedant information for DKFZphutel_22el2, frame 1 



Report for DKFZphutel_22el2 . 1 



[LENGTH] 92 

[MW] 10614.98 

[plj 5.04 

[HOMOL] PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces cerevisiae) 
5e-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation (S. cerevisiae, YGL054cJ 
2e-15 

[PIRKW1 transmembrane protein 2e-ll 

[PROSITE] CK2_PHOSPHO_SITE 3 

[KW] SIGNAL_PEPTIDE 33 

[KWJ TRANSMEMBRANE 2 



SEQ MEAVVFVFSLLDCCALI FLSVYFI ITLSDLECDYINARSCCSKLNKWVI PELIGHTI VTV 

PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 

MEM MMMMMMMMMM 

SEQ L L LMS L HW F I FLLN Xj PVAT WN IYRMILALIND 

PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 

MEM MMMMMMMMMMMMMMMMMMM . . MMMMMMM 



Prosite for DKF2phutel_22el2 . 1 

PS00006 9->13 CK2_PH0SPH0_SITE PDOC00006 

PS00006 26->30 CK2_PHOSPHO_SITE PDOC00006 

PS00006 28->32 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_22el2 . 1) 
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DKFZphutel 22n2 



group: uterus derived 

DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="553.3 cR from top of Chrll linkage group" 
Insert length: 1556 bp 

Poly A stretch at pos . 1534, no polyadenylation signal found 

1 ACAACAGGCT GGTTGCTTGG CGTGGAATCC TAAAGTGGCC TGGCTTTGAG 

51 ACTGGAGTGA GACCCCAGCC CTAGGCTGGG GTTCTTTCCA TTATAGAGGA 

101 GACGGATTCA GAAGGGCTAC AGACCAAGGT TGTTGAAAAC CAGACATATG 

151 ATGAGCGTCT AGAGATTAAC GACTCCGAAG AGGTTGCAAG TATTTATACT 

201 CCAACCCCAA GACACCAAGG ACTTCCTCGT TCTGCCCATC TTCCTAACAA 

251 GGCTATGGCT GATAACAGCA GTGATGAGTG TGAAGAGGAA AATAACAAGG 

301 AGAAGAAGAA GACCTCACAG TTGACACCTC AACGGGGCTT TAGTGAAAAT 

351 GAGGATGACG ATGATGATGA TGATGATTCA TCTGAAACTG ATTCTGATTC 

401 TGATGATGAT GATGAAGAGC ATGGAGCCCC TCTGGAAGGG GCCTATGACC 

451 CTGCAGACTA TGAGCATTTG CCAGTTTCTG CTGAAATTAA GGAACTCTTC 

501 CAGTACATCA GTAGGTACAC ACCTCAGTTG ATTGACCTGG ACCACAAACT 

551 GAAGCCTTTC ATTCCTGATT TTATCCCAGC TGTCGGGGAT ATTGATGCAT 

601 TCTTAAAGGT CCCACGTCCT GATGGAAAGC CTGACAACCT TGGCCTATTG 

651 GTATTGGATG AACCTTCTAC AAAGCAGTCA GACCCTACGG TGCTCTCACT 

701 CTGGTTAACA GAGAATTCTA AGCAGCACAA CATCACACAA CATATGAAAG 

751 TAAAAAGCCT AGAAGATGCA GAAAAGAATC CCAAAGCCAT TGACACGTGG 

801 ATTGAGAGCA TCTCTGAATT ACACCGTTCT AAGCCCCCTG CGACTGTGCA 

851 CTACACCAGG CCCATGCCCG ACATTGACAC GCTGATGCAG GAATGGTCCC 

901 CGGAGTTTGA AGAGCTTTTG GGCAAGGTAA GCCTGCCCAC GGCAGAGATT 

951 GATTGCAGCC TGGCAGAGTA CATTGACATG ATCTGTGCCA TTCTAGACAT 

1001 CCCTGTCTAC AAGAGTCGGA TCCAGTCCCT CCATCTGCTC TTTTCCCTCT 

1051 ACTCAGAATT CAAGAACTCA CAGCATTTTA AAGCTCTCGC TGAAGGCAAG 

1101 AAAGCATTCA CTCCTTCATC CAATTCCACC TCCCAAGCTG GAGACATGGA 

1151 GACATTAACC TTCAGCTGAG ACACTTCCCA AGCTGCTGTT TCAAGGCTGA 

1201 GCTGGCCCCT CTGCCCCAGC TGAGATGGAC AGATCGTTGT CAGCTACTTG 

1251 ATGTCCTTGC CCATGCCACA GCTTGGCTCA GGGGCAGTGC ATGTCCTGCT 

1301 GCCCTCTCTG CCAGAGGGCA CAGAACATGT TTGTTTAATG AACCTGCCTG 

1351 CCTCAGATTG CTGTCCCCGG GGAGTTAATG CATCTACACC ACTGTGGGGA 

1401 TTTGAGTTAT AAGAATTGGA ATTTCTGAGA TCCCATGGAG GTTAGATTGG 

1451 GAGGAAAGCT TAAAAGATGT CCTTTTTGTG AGAGGGATGG AATTGTTTTC 

1501 TTTCATTCGT AAAGTTAGTG AGTAAAGATT TTATAAATCA AAAAAAAAAA 

1551 AAAAAA 



BLAST Results 



Entry HS188252 from database EMBL: 
human STS WI-12265. 
Score - 2554, P - 4.1e-109, identities = 556/587 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE EENNKEKKKT SQLTPQRGFS ENEDDDDDDD DSSETDSDSD 
51 DDDEEHGAPL EGAYDPADYE HLPVSAEIKE LFQYISRYTP QLIDLDHKLK 
101 PFIPDFIPAV GDIDAFLKVP RPDGKPDNLG LLVLDEPSTK QSDPTVLSLW 
151 LTENSKQHNI TQHMKVKSLE DAEKNPKAID TWIESISELH RSKPPATVHY 
201 TRPMPDIDTL MQEWSPEFEE LLGKVSLPTA EIDCSLAEYI DMICAILDIP 
251 VYKSRIQSLH LLFSLYSEFK NSQHFKALAE GKKAFTPSSN STSQAGDMET 
301 LTFS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22n2, frame 3 

PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae), N 
Score = 132, P - le-05 



>PIR:S38149 SIS2 protein - yeast {Saccharomyces cerevisiae) 
Length = 562 



Score = 132 (19.8 bits), Expect = 1.0e-05, P = 1.0e-05 
Identities « 24/63 (38%), Positives - 35/63 (55%) 

Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 497 EEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 

D 

Sbjct: 557 IID 559 

Score = 122 (18.3 bits), Expect » 1.4e-04, P - 1.4e-04 
Identities = 20/52 (38%), Positives = 33/52 (63%) 

Query: 4 NSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

N+ +E ++E+ +E + T + + N+DDDDDDDD + D D DDDD++ 

Sbjct: 494 NNEEEDDDEDEEEDDDEEEDTEOKNENNNDDDDDDDDDDDDDDDDDDDDDDD 545 



Pedant information for DKFZphutel_22n2, frame 3 



Report for DKFZphutel_22n2 . 3 



(LENGTH] 


304 




[MW] 


34285.85 




[pi] 


4.37 




[PROSITE] 


AMIDATION 1 




(PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO_SITE 


10 


(PROSITE] 


PKC PHOSPHO SITE 


1 


I PROSITE] 


ASN GLYCOSYLATION 


3 


[KW] 


All Alpha 




fKW] 


LOW COMPLEXITY 11, 


.84 % 



SEQ MADNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

SEQ EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFIPAVGDIDAFLKVP 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

SEQ RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

SEG 

PRD ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 

SEQ TWIESISELHRSKPPATVHYTRPMPDIDTLMQEWSPEFEELLGKVSLPTAEIDCSLAEYI 

SEG 

PRD hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 

SEQ -DMICAILDIPVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 

SEG 
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PRD hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 

SEQ LTFS 

SEG 

PRD cccc 



Prosite for DKFZphutel_22n2 . 3 



PS00001 


4->8 


PS00001 


159->163 


PS00001 


290->294 


PS00004 


17->21 


PS00004 


18->22 


PS00005 


138->141 


PS00006 


5->9 


PS0O006 


30->34 


PS00006 


43->47 


PS00006 


45->49 


PS00006 


47->51 


PS00006 


49->53 


PS00006 


168->172 


PSO0006 


181->185 


PSOOO06 


185->189 


PS00006 


235->239 


PS00009 


280->284 



ASN_GLYCOSYLATION 

ASNJ3LYCOSYLATION 

ASN_GLYCOSYLATION 

C AMP_PHOS PHO_S I TE 

CAMP_PHOS PHO_SI TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

AMI DAT I ON 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00009 



{No Pfara data available for DKFZphutel_22n2 . 3) 
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DKFZphutel_22o2 



group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S.pombe SPBC3E7.03c 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: map-"llpl5 . 5" 
Insert length: 2714 bp 

Poly A stretch at pos. 2695, polyadenylation signal at pos. 2677 



1 GCAGGGCACG GTGGGGGCTG AGATCGT.TTC CTGTTGGAAC TTCTGGCCCA 
51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 
101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 
151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGTTA AAGCGCCACC 
201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 
251 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 
' 301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 
351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 
401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 
451 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 
501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 
551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 
601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 
651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 
701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 
751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 
801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 
851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 
901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 
951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 
1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC ATCAAGGGGG AGGTGGACGA 
1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 
1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 
1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 
1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 
1251 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 
1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 
1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 
1401 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 
1451 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 
1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 
1551 GTGTGCCCCG ATTCATCAAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 
1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 
1651 AGAGGATGAG GACACAGACA CAGATGAGTA CAAGGAAGCC AAAGCCAGCA 
1701 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 
1751 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 
1801 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 
1851 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 
1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 
1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 
2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 
2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 
2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 
2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 
2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 
2251 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 
2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 
2351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 
2401 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 
2451 CCTGTCTTCA CTGAGCGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 
2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 
2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 
2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 
2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 



Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score - 3356, P = 2.0e-144, identities - 672/673 

Entry HS263253 from database EMBL: 
human STS SHGC-15914. 
Score - 1143, P - 9.0e-46, identities - 245/255 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 326 bp to 1936 bp; peptide length: 537 
Category: similarity to unknown protein 



1 MEPRAVAEAV ETGEEDVIME 
51 VSVLEQGLPP SHRVIWLQSV 
101 SEGSVPESAD MDVVLESLKC 
151 YRERSFPHDV QFFDLRLLFL 
201 GVTPEGNPPP TLLPSQETER 
251 LGTLLRHCVM IATAGDRTEE 
301 EFMGVNMDVI RALLIFLEKR 
351 FLKAQGWPPP QVLPPLRDVR 
401 FLFVLCSESV PRFIKVTGYG 
451 EYKEAKASIN PVTGRVEEKP 
501 VIQPMGMSPR GHLTSLQDAM 



ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 
RILSRDRNCL DPFTSRQSLQ ALACYADISV 
LCNLVLSSPV AQMLAAEARL VVKLTERVGL 
LTALRTDVRQ QLFQELKGVR LLTDTLELTL 
AMEILKVLFN ITLDSIKGEV DEEDAALYRH 
FHGHAVNLLG NLPLKCLDVL LTLEPHGDST 
LHKTHRLKES VAPVLSVLTE CARMHRPARK 
TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 
NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 
PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 
CETMEQQLSS DPDSDPD 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22o2, frame 2 

TREMBL: SPBC3E7_3 gene: "SPBC3E7 . 03c" ; product: "hypothetical protein" ; 
S.pombe chromosome II cosmid c3E7 . , N = 1, Score - 112, P = 0.0023 



>TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7. 
Length - 362 



Score = 112 (16.8 bits), Expect « 2.3e-03, P = 2.3e-03 
Identities =» 71/289 (24%), Positives = 124/289 (42%) 

Query: 215 SQETERAM-EILKVLFNITLDSIKGEVDEEDAALYRHLGTLLRHCVMIATAGDRTEEFHG 273 

SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

Sbjct: 12 SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

Query: 27 4 HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVIRALLIFLEKRLHKTH RL 327 

HA N LL NL L LD + + T + +1 + +LEK L+ + 
Sbjct: 66 HATN ALLS FNLQLL S LDQA I Y VS E I AC QT LQSILISREVEYLEKGLNLCFDIAAKY 121 

Query: 328 KESVAPVLSVLTECARMHRPARKFLKAQGWPPPQVLPPLRDVRTRP-EVGEMLRNKLVRL 386 

+ ++ P+L++L + +LPDR++G+R L+RL 

Sbjct: 122 QNTLPPILAILLSLLSFFNIKQNL SMLLFPTNDDRKQSLQKGKSFRCLLLRL 173 

Query: 387 MT-HLDTDVKRVAAEFLFVLCSESVPRFIKYTGYGNAAGLLAARGLMAGGRPEGQYS 442 

+T++ ALLC + + GGAG+ M P+ + 

Sbjct: 174 LTIPIVEPIGTYYASLLNELCDGDSQQIARIFGAGYAMGISQHSETMPFPSPLSKAASPV 233 

Query: 443 -EDEDTDTDEYKEAKASINPVTGRV--EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 4 99 

+ + +E +I+P+TG + +E +++E+KE EA +L +F +L +N 

Sbjct: 234 FQKNSRGQENTEENNLAIDPITGSMCTNRNKSQRLE-LSQEEKEREAERLFYLFQRLEKN 292 
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Query: 500 RVIQ 503 
IQ 

Sbjct: 293 STIQ 296 



Pedant information for DKFZphutel_22o2, frame 2 



Report for DKFZphutel_22o2.2 



[LENGTH] 


537 


(HW] 


60372.53 


tpl] 


5.20 


[BLOCKS] 


BL00415L Synapsins proteins 


[PROSITE] 


MYRISTYL 4 


[PROSITE] 


CK2 PHOSPHO SITE 13 


[PROSITE] 


PKC PHOSPHO SITE 10 


[PROSITE] 


ASN GLYCOSYLATION 1 


[KWJ 


All Alpha 


[KW] 


LOW COMPLEXITY 9.50 % 



SEQ MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 

SEG 

PRD ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhhhhhhhhhhhhccccc 

SEQ SHRVIWLQSVRILSRDRNCLDPFTSRQSLQALACYADISVSEGSVPESADMDVVLESLKC 

SEG 

PRD cceeeeeccccccccccccccccchhhhhhhhhhhhceeeeccccccccchhhhhhhhhh 

SEQ LCNLVLSSPVAQMLAAEARLVVKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 

SEG xxxxxxxxxxxxxxx... 

PRD hhhhccccchhhhhhhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 

SEQ QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 

SEG 

PRD hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccchhh 

SEQ DEEDAALYRHLGTLLRHCVMIATAGDRTEEFHGHAVNLLGNLPLKCLDVLLTLEPHGDST 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 

SEQ EFMGVNMDVIRALLIFLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 

SEG 

PRD eeeehhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 

SEQ QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFI KYTGYG 

SEG xxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccccceeeecccc 

SEQ NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

SEG xxxxxxxxxxxxxxx xxxxxxxxx 

PRD chhhhhhhhhccccccccccccccccccchhhhhhhhhccccccceeecccccccchhhh 

SEQ QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 



Prosite for DKFZphutel_22o2 . 2 



PS00001 


230- 


>234 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00005 


61 


->64 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


69 


->72 


PKC 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


84 


->87 


PKC 


"PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


117- 


>120 


PKC' 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


145- 


>148 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


218- 


>221 


PKC" 


"PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


235- 


>238 


PKC' 


"PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


324- 


>327 


PKC' 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


463- 


>466 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


508- 


>511 


PKC' 


"PHOSPHO" 


'site 


PDOC00005 


PS00006 


12 


->16 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


34 


->38 


CK2 


~PHOSPHO" 


"site 


PDOC00006 


PS00006 


52 


->56 


CK2 


"PHOSPHO" 


'site 


PDOC00006 


PS00006 


99- 


>103 


CK2 


"PHOSPHO - 


"site 


PDOC00006 


PS00006 


104- 


>108 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


263- 


>267 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


371- 


>375 


CK2* 


"PHOSPHO" 


"site 


PDOC00006 



523 



WO 01/12659 



PCT/IB00/01496 



PS00006 


388- 


>392 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


442- 


>446 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


447- 


>451 


CK2~PHOSPHO_ 


"site 


PDOC00006 


PS00006 


491- 


>495 


CK2 PHOSPHO* 


"site 


PDOC000O6 


PS00006 


515- 


>519 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS0O0O6 


530- 


>534 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00008 


57 


->63 


MYRISTYL 




PDOC00008 


PS00008 


420- 


>426 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


430- 


>436 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_22o2.2) 
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DKFZphutel_23el3 



group: metabolism 

DKFZphtes3_15jl8 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins . 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 



heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map="578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos. 1831, polyadenylation signal at pos. 1810 



1 GGTTTATTAA GCTCCTGGCT CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 

51 AGCCTGGGCA GCCTGGGAAG CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 

101 GTGAGGCAGT GCGGACGGGG ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 

151 GGGGTTACCT TTGGGGGCTG GGACCCCAGT CGAGGGGACA CAACCGTCCC 

201 TGGCAGTGGT TGGTTCTGCT TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 

251 GCTGAAGAAT AAGCTAGCCC AGCCACACCA CCTTGTTGTG TGACCTTGGG 

301 CAGGTGGTTC TGTCTCTCTG AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 

351 ACCATGGCTG ACGGTCAGAT GCCCTTCTCC TGCCACTACC CAAGCCGCCT 

401 GCGCCGAGAC CCCTTCCGGG ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 

4 51 ATGGCTTTGG CATGGACCCC TTCCCAGACG ACTTGACAGC CTCTTGGCCC 

501 GACTGGGCTC TGCCTCGTCT CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 

551 GGGCATGGTG CCCCGGGGCC CCACTGCCAC CGCCAGGTTT GGGGTGCCTG 

601 CCGAGGGCAG GACCCCCCCA CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 

651 GTGAATGTGC ACAGCTTCAA GCCAGAGGAG TTGATGGTGA AGACCAAAGA 

701 TGGATACGTG GAGGTGTCTG GCAAACATGA AGAGAAACAG CAAGAAGGTG 

751 GCATTGTTTC TAAGAACTTC ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 

801 GATCCTGTGA CAGTATTTGC CTCACTTTCC CCAGAGGGTC TGCTGATCAT 

851 CGAAGCTCCC CAGGTCCCTC CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 

901 ACAACGAGCT TCCCCAGGAC AGCCAGGAAG TCACCTGTAC CTGAGATGCC 

951 AGTACTGGCC CATCCTTGTT TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 

1001 CAGGATACAT TACTTTAGCT GAACTCAGAT TTAGTGCAAG TAAAATGTTA 

1051 GAGGGTGCGG GGGTGAGGAC TGACCACAGA TTCCCTGGAT AGTGTAGTGG 

1101 TAGATTTCTC CACAGGATAG CGCAATTGGC AAATCATGCT TGGTTGTGTT 

1151 AGGCCAAAAT ACTAGTTTTG CTTTCTTTAC CTTTTCTATC TTGATGAAAA 

1201 TGTTGCACAT TCTATAGTTG CAAAACACAT AAAAGGGGAC TTAACATTTC 

1251 ACGTTGTATC TTACTTGCAG TGAATGCAAG GGTTACTTTT CTCTGGGGAC 

1301 CTCCCCCATC ACCCAGGTTC CTACTCTGGG CTCCCGATTC CCATGGCTCC 

1351 CAAACCATGC CGCATGGTTT GGTTAATGAA ACCCAGTAGC TAACCCCACT 

14 01 GTGCTTCCAC ATGCCTGGCC TAAAATGGGT GATATACAGG TCTTATATCC 

1451 CCATATGGAA TTTATCCATC AACCACATAA AAACAAACAG TGCCTTCTGC 

1501 CCTCTGCCCA GATGTGTCCA GCACGTTCTC AAAGTTTCCA CATTAGCACT 

1551 CCCTAAGGAC GCTGGGAGCC TGTCAGTTTA TGATCTGACC TAGGTCCCCC 

1601 CTTTCTTCTG TCCCCTGTGT TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 

1651 CTCCAGACAG CTCCATCAGG AACCAAGCAA AGGCCAGATA GCCTGACAGA 

1701 TAGGCTAGTG GTATTGTGTA TATGGGCGGG ACGTGTGTGT CATTATTATT 

1751 TGAGTTATGC TGTTGTTTAG GGGTAAATAA CAGTAAATAA TTAATAATAA 

1801 TAATAATAAT AATAAAGGAG CTGACGTTCT TAAAAAAGAA AAAAAAAAAA 

1851 AAAA 



BLAST Results 



Entry HS286348 from database EMBL: 
human STS TIGR-A002 J47 . 
Score « 510, P - 1.2e-16, identities = 102/102 
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Medline entries 



95394379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein. 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 



Peptide information for frame 3 



ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE_ASP (28-39) 



1 MADGQMPFSC HYPSRLRRDP FRDSPLSSRL LDDGFGMDPF PDDLTASWPD 

51 WALPRLSSAW PGTLRSGMVP RGPTATARFG VPAEGRTPPP FPGEPWKVCV 

101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGIVSKNFT KKIQLPAEVD ' 

151 PVTVFASLSP EGLLI IEAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_23el3, frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N *■ 1, Score - 304, P = 
4. 3e-27 

PIR.-JN0924 heat shock 27 protein - rat, N - 1, Score = 301, P = 8.9e-27 

TREMBL:MM03561_1 product: "heat shock protein HSP27" ; Mus musculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N - 1, Score = 301, P - 8.9e-27 



>PIR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs: 



Score = 304 (45.6 bits), Expect = 4.3e-27, P - 4.3e-27 
Identities - 80/182 (43%), Positives = 102/182 (56%) 



Query: 


1 


MADGQMPFSC-HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSS 


58 




M + ++PFS PS DPFRD P SRL D FG+ P++ WW S 




Sbjct: 


1 


MTERRVPFSLLRSPSW DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 


50 


Query: 


59 


AW PGTLRSGMVP RGPTATARFGVPAEGR — TPPPFPG EPWKVCVNVHSF 


105 




WPG +R +P GP A A PA R + G + W+V ++V+ F 




Sbjct: 


51 


GWPGYVRP — IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 


108 


Query: 


106 


KPEELMVKTKDGYVEVSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI 


165 






PEEL VKTKDG VE++GKHEE+Q E G +S+ T K LP VDP V +SLSPEG L 




Sbjct: 


109 


APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 


168 


Query: 


166 


IEAPQVPPYSTFGE 179 








+EAP P + E 




Sbjct: 


169 


VEAPMPKPATQSAE 182 





Pedant information for DKFZphutel_23el3, frame 3 



Report for DKFZphutel_23el3 . 3 



[LENGTH J 196 

(MWJ 21604.37 
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[pi] 


5.00 


[HOMOL] 


PIR:JC4244 heat-shock 27K protein - dog 3e-22 


[BLOCKS] 


BL01031C 


[PIRKW] 


blocked amino end le-13 


[PIRKW] 


acetylated amino end 4e-13 


[PIRKW] 


phosphoprotein 7e-21 


[PIRKW] 


glycoprotein 2e-ll 


[PIRKW] 


heat shock 7e-21 


[PIRKW] 


molecular chaperone 4e-13 


[PIRKW] 


alternative splicing le-19 


[PIRKW] 


eye lens 6e-14 


[PIRKW] 


stress-induced protein 7e-21 


[SUPFAM] 


alpha-crystallin 7e-21 


[PROSITE] 


SUBTILASE ASP 1 


[PROSITE] 


MYRISTYL 2 


[PROSITE] 


CK2 PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO SITE 6 


[PROSITE] 


ASN_GLYCOSYLATION 1 


[PFAM] 


Heat shock hsp20 proteins 


[KW] 


All Beta 


[KW] 


LOW COMPLEXITY 7.14 % 



SEQ MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 

SEG xxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 

SEQ PGTLRSGMVPRGPTATARFGVPAEGRTPPPFPGEPWKVCVNVHSFKPEELMVKTKDGYVE' 

SEG 

PRD cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeecccceee 

SEQ VSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLIIEAPQVPPYSTFGES 

SEG 

PRD eccchhhhhcccceeeeccccccccccccccceeeecccccceeeeeccccccccccccc 

SEQ SFNNELPQDSQEVTCT 

SEG 

PRD cccccccccceeeccc 



Prosite for DKFZphutel_23el3 . 3 



PS00001 


138->142 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


27->30 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


63->66 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


76->79 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


104->107 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


122->125 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


140->143 


PKC PHOSPHO SITE 


PDOC00005 


PSO0006 


47->51 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


176->180 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00008 


62->68 


MYRISTYL 


PDOC00008 


PS00008 


132->138 


MYRISTYL 


PDOC00008 


PS00136 


28->39 


SUBTILASE ASP 


PDOC00125 



Pfam for DKFZphutel_23el3 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Heat shock hsp20 proteins 



77 



*AMMrpPWDWRE DpDHFeVrMDMPGFKPEEIKVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 



TVPKpEP* 
++P ++P 
167 EAPQVPP 



173 



123 



EHEREEEREDDkWWWHERI YRHFMRRFrLPENVDpDql kAsMSdNGVLTI 
+HE E++ + + ++ F +++LP +VDP + AS+S++G+L I 

124 KHE EKQQ EGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLII 166 
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DKFZphutel_23gll 



group: uterus derived 

DKFZphutel_23gll encodes a novel 256 amino acid protein with similarity to S.pombe 
SPAC31G5.12c and S. cerevisiae Maflp. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to SPAC31GS.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1674 bp 

Poly A stretch at pos. 1664, polyadenylation signal at pos. 1644 



1 GGGGGAGGCG GAGGTCGCTC GCTCGCTCGC TCGGCTCGCT GACTCGCCGG 
51 AGCGCTCTGT GGCGGTCGGC GGCAGGTCGG TCGCGAGAGC GGGCTCTGTG 
101 GAAGGGGGCG AGGCTATGTC GCGGTGGCAG CCCGGATGGG CCGGCAGGGC 
151 CGGGAGTAAC GGGACGTCGC CGCGGAGCTT CTTCCCCCGG ATACAGTGCG 
201 GCCCGAGCGG AGGCCGCGGC GCCGCCCTCC GATCTTGAAG AGCCCGCGCT 
251 GCGCGGAGCC CGCCCCCGCC TGCGCACCGG CACCGACGCG GAGCGACCAG 
301 CCCAGCCAGA CCCGGCCCGG CGCGGCCTGA TCTAACCCAG CCAGGCAGGC 
351 AATACTAGCC CCTCTGGAGC ACGGAGCTCC TTCCCCAAAG ACATGAAGCT 
401 ATTGGAGAAC TCGAGCTTTG AAGCCATCAA CTCACAGCTG ACTGTGGAGA 
451 CCGGAGATGC CCACATCATT GGCAGGATTG AGAGCTACTC ATGTAAGATG 
501 GCAGGAGACG ACAAACACAT GTTCAAGCAG TTCTGCCAGG AGGGCCAGCC 
551 CCACGTGCTG GAGGCACTTT CTCCACCCCA GACTTCAGGA CTGAGCCCCA 
601 GCAGACTCAG CAAAAGCCAA GGCGGTGAGG AGGAGGGCCC CCTCAGTGAC 
651 AAGTGCAGCC GCAAGACCCT CTTCTACCTG ATTGCCACGC TCAATGAGTC 
701 CTTCAGGCCT GACTATGACT TCAGCACAGC CCGCAGCCAT GAGTTCAGCC 
751 GGGAGCCCAG CCTTAGCTGG GTGGTGAATG CAGTCAACTG CAGTCTGTTC 
801 TCAGCTGTGC GGGAGGACTT CAAGGATCTG AAACCACAGC TGTGGAACGC 
851 GGTGGACGAG GAGATCTGCC TGGCTGAATG TGACATCTAC AGCTATAACC 
901 CAGACTTGGA CTCAGATCCC TTCGGGGAGG ATGGTAGCCT CTGGTCCTTC 
951 AACTACTTCT TCTACAACAA GCGGCTCAAG CGAATCGTCT TCTTTAGCTG 
1001 CCGTTCCATC AGTGGCTCCA CCTACACACC CTCAGAGGCA GGCAACGAGC 
1051 TGGACATGGA GCTGGGGGAG GAGGAGGTGG AGGAAGAAAG CAGAAGCAGG 
1101 GGCAGTGGGG CCGAGGAGAC CAGCACCATG GAGGAGGACA GGGTCCCAGT 
1151 GATCTGTATT TGATGAGGAG GAGCCGAGGC CCCAGCTTCA TCCAGCTTCA 
1201 ACCAATGCCT GGACCTGTCC ACCTGAGAGG CCCCTGGGGC CTCCCCAGCT 
1251 GCTGGCCAGA CCCTGGCGCT GCCACAGTCC TGGCACTGCC CAAGGCCATA 
1301 CCTGCCTAGC CCTTTGGCTC CATCCTGTGG ATGCCCACTC ACCCCTCAGA 
1351 CTCCTGCTGC CCATGCTGTG GCCGGACTTG TCAGCAGGGG GCCTGGTGGG 
1401 AGGAGCGACT GCCCTGCCCA AATGAACTGC CACAGCAGGG ACAGCTGGAC 
1451 CGCAGAGTTT ATTTTTGTAT TTCTACTGGG CCTGCACACT CCAGCCCAAA 
1501 GGGTCTGTGG CCGGAGGCCC CACGAGCAGG CCCCAGCAGT CACCGGCTCT 
1551 GGTCTTGGGC CGGCCCCGGT GCCCACCTGT ACCCCCACCT CGCCCATTTG 
1601 GCCGCGTGCA CTGAGTGTCA CTTTGCTGCA GCTCGTTTCT TTCCAATAAA 
1651 AGTTTCTGTG ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 
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1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD KHMFKQFCQE 
51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR KTLFYLIATL 
101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR EDFKDLKPQL 
151 WNAVDEEICL AECDIYSYNP DLDSDPFGED GSLWSFNYFF YNKRLKRIVF 
201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA EETSTMEEDR 
251 VPVICI 

BLASTP hits 
Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c31G5. 

Score - 272, P - 9.3e-24, identities - 51/127, positives = 80/127 
Entry SPD656_1 from database TREMBL: 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds . 

Score " 263, P = 8.4e-23, identities = 50/127, positives - 79/127 
Entry S50986 from database PIR: 

MAF1 protein - yeast (Saccharomyces cerevisiae} >SWISSPR0T:MAF1_YEAST 
MAF1 PROTEIN. >TREMBL: SC19492_1 gene: "MAF1" ; product: "Maflp"; 
Saccharomyces cerevisiae Maflp (MAF1 ) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp"; product: "Maflp"; S. cerevisiae 
chromosome IV cosmid 8119. 

Score - 180, P - 2.3e-17, identities » 43/133, positives - 75/133 

Entry AF098499_2 from database TREMBL: 

gene: "C43H8.2"; Caenorhabditis elegans cosmid C43H8. 

Score - 263, P = 9.2e-23, identities « 78/252, positives - 118/252 



Alert BLASTP hits for DKFZphutel_23gll , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_23gl 1 , frame 3 



Report for DKFZphutel_23gll . 3 



[LENGTH] 256 

[MW] 28869.95 

{pi] 4.51 

[HOMOL] TREMBL:SPAC31G5_12 gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; 
S.pombe chromosome I cosmid c31G5. 4e-23 

{ FUNC AT ] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YDROOSc] 
6e-13 

[PROSITE] MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] PKC_PHOSPHO_SITE : 6 

[PROSITE] ASN_GLYCOSYLATION 3 

[KWJ All_Alpha 

[KW] LOW_COMPLEXITY 7.81 % 



SEQ MKLLENSSFEAINSQLTVETGDAHIIGRIESYSCKMAGDDKHMFKQFCQEGQPHVLEALS 

SEG 

PRD cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 

SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWVVNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDI YSYNPDLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 

SEQ GSLWSFNYFFYNKRLKRI VFFSCRSISGSTYTPSEAGNELDMELGEEEVEEESRSRGSGA 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 

SEQ EETSTMEEDRVPVICI 

SEG XX 

PRD cccccccccceeeccc 



529 



WO 01/12659 



PCT/IBOO/01496 



prosite for DKFZphutel_23gll . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PSO0O08 
PS00008 



6- >10 
101->105 
132->136 

33->36 
85->88 
89->92 
103-MO6 
112->115 
202->205 

7- >ll 
99->103 

212->216 
23S->242 
244->248 
66->72 
181->187 
239->245 



ASN_GL YC OS Y LAT I ON 

ASN_GLYCOSYLATION 

ASN_GL YC OS Y LAT I ON 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOCOOOOl 
PDOC00001 
PDOCOOOOl 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_23gll . 3) 
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DKFZphutel_24cl9 



group: transmembrane protein 

DKFZphutel_24cl9 encodes a novel 195 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 

unknown 

membrane regions: 1 

Summary DKFZphutel_24cl9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 



unknown 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

insert length: 769 bp 

Poly A stretch at pos. 746, polyadenylation signal at pos. 735 



1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 
51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 
101 TGGGAGGAGA GCATAAGGCT CAAAATGGAA AATCATAAAT CCAATAATAA 
151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 
201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 
251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 
301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 
351 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 
401 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 
451 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 
501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 
551 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 
601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 
651 AACAATATAA ACTACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 
701 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 
751 ACAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 



1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLI A 
51 NSLFRRILNV TKARIAAGLP MAGIPFLTTD LTYRCFVSFP LNTGDLDCET 
101 CTITRSGLTG LVIGGLYPVF LAIPVNGGLA ARYQSALLPH KGNILSYWIR 
151 TSKPVFRKML FPILLQTMFS AYLGSEQYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphutel_24cl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phutel_24cl9, frame 2 



Report for DKFZphutel_24cl9.2 



[LENGTH] 195 

[MWJ 21527.45 

(plj 9.36 

[PROSITE] MYRISTYL 6 

(PROSITE] CK2_PHOSPHO_SITE 1 

(PROSITE) PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] TRANSMEMBRANE 1 



SEQ MENHKSNNKENITIVDISRKINQLPEAERNLLENGSVYVGLNAALCGLIANSLFRRILNV 

PRD cccccccccceeeeeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 

MEM 

SEQ TKARIAAGLPMAGI PFLTTDLTYRCFVSFPLNTGDLDCETCTITRSGLTGLVIGGLYPVF 

PRD hhhhhhhccccccceeeeecccccccccccccccccccccccccccccceeeecccceee 

MEM MMMMMMMMMMMMMM 

SEQ LAIPVNGGLAARYQSALLPHKGNILSYWIRTSKPVFRKMLFPILLQTMFSAYLGSEQYKL 

PRD eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 

MEM MMM - 

SEQ LIKALQLSEPGKEIH 

PRD hhhhhhhcccccccc 

MEM 



Prosite for DKFZphutel x 24cl9 . 2 



PS00001 


11->15 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


34->38 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


59->63 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


18->21 


PKC PHOSPKO SITE 


PDOC00005 


PS00005 


82->85 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


151->154 


PKC PHOSPHO~SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


40->46 


MYRISTYL 


PDOC00008 


PS00008 


47->53 


MYRISTYL 


PDOC00008 


PS00008 


68->74 


MYRISTYL 


PDOC00008 


PS00008 


110->116 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


MYRISTYL 


PDOC00008 


PS00008 


142->148 


MYRISTYL 


PDOC00008 



(No P£am data available Cor DKE'Zphutel_24cl9.2) 
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DKFZphutel_24ell 



group: intracellular transport and trafficking 

DKF2phutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golgi 
4-transmembrane spanning transporter MTP. MTP may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane- 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport . 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane-bound 
compartments . 

Similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane -bound compartment? 
Sequenced by Qiagen 
Locus: /map="8" 
Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylation signal at pos. 1963 



1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 

51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 

101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 

151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 

201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 

251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 

301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 

351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 

401 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 

4 51 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 

501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 

551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 

601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 

651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 

701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 

751 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 

801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 

851 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 

901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 

951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 

1001 TAGATTGAAA ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 

1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 

1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 

1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 

1201 TCCCTCTCTT TTGAAAATGT AAAATAAAAC CAAAAATAGA CAACTTTTTC 

1251 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 

1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG 

1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 

1401 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 

14 51 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 

1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 

1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 

1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 

1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 

1701 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 

1751 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 

1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 

1851 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 

1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 

1951 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 

2001 AAAAA 



BLAST Results 



Entry HS012351 from database EMBL: 
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human STS SHGC-31823. 
Score = 1629, P - 3.1e-67, identities = 343/354 



Medline entries 



96199248: 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 



Peptide information for frame 1 



ORF from 184 bp to 861 bp; peptide length: 226 
Category: strong similarity to known protein 



1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 
51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 
101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 
151 PTCLVLIILL FISIILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 
201 TVLLPPYDDA TVNGAAKEPP PPYVSA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24ell, frame 1 

SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., N - 1, Score - 551, P - 2.9e-53 

SWISSPROT:MTRP_MOUSE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP., N 
- 1, Score = 539, P - 5.3e-52 

TREMBL : HS304 98 1_1 product: "E3 protein"; Human retinoic acid-inducible 
E3 protein mRNA, complete cds., N - 1, Score => 127, P » 3.4e-06 



>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
{KIAA0108) . 

Length - 233 

HSPs: 

Score ■ 551 (82.7 bits), Expect = 2.9e-53, P = 2.9e-53 
Identities = 102/221 (46%), Positives = 148/221 (66%) 

Query: 9 RFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQY NFSSSELGGDF- 64 

RFYS CC CCHVRTGTI+LG WY+++N ++ ++L + P+ N +G + 

Sbjct: 13 RFYSTRCCGCCHVRTGTIILGTWYMVVNLLMAILLTVEVTHPNSMPAVNIQYEVIGNYYS 72 

Query: 65 -EFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAITVL 123 

E M D N C+ A+S+LM +1 +M YGA + W+I PFFCY++FDF L+ LVAI+ L 
Sbjct: 73 SERMAD-NACVLFAVSVLMFIISSMLVYGAISYQVGWLIPFFCYRLFDFVLSCLVAISSL 131 

Query: 124 IYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCYRYI 183 

Y I+EY+ QLP +FPY+DD+++++ +CL+ I + L+F ++ + FK YLI+CVWNCY+YI 
Sbjct: 132 TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFI VLVFFALFIIFKAYLINCVWNCYKYI 190 

Query: 184 NGRNSSDVLVYVTSN-DTTVLLPPYDDATVNGAAKEPPPPYVSA 226 

N RN ++ VY +LP Y+ A V KEPPPPY+ A 

Sbjct: 191 NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 



Pedant information for DKFZphutel_24ell, frame 1 



Report for DKF2phutel_24ell . 1 



[LENGTH] 226 

[MW| 25419.11 
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[pi] 


4.65 




[HOMOL] 


SWISSPROT:MTRP HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP (KIAA0108) 


5e-40 






[PROSITE1 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


1 


[PROSITE] 


ASN~GLYCOSYLATION 


3 


IKW] 


SIGNAL PEPTIDE 49 




[KWJ 


TRANSMEMBRANE 2 




[KW] 


LOW COMPLEXITY 


20.80 % 



SEQ MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

SEG xxxxxxxxxxxxxxxx 

PRO ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 

MEM 

SEQ GGDFEFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAI 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhnhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TVLIYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . . 

SEQ RYINGRNSSDVLVYVTSNDTTVLLPPYDDATVNGAAKEPPPPYVSA 

SEG 

PRD eecccccccceeeeeecccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphutel_24el 1 . 1 



PS00001 
PS00001 
PS00001 
PS00005 
PS00006 
PS00006 
PS00006 
PS00007 



54->58 
187->191 
198->202 
167->170 

56->60 
128->132 
196->200 
186->195 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
PKC_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
CK2_PH0SPH02SITE 
CK2_PHOSPHO_SITE 
TYR PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 



(No Pfam data available for DKFZphutel_24ell . 1) 
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DKFZphutel_24j6 



group: cell structure and motility 

DKF2phutesl_24j 6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator {CARD. 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 



strong similarity to rat CAR1 A.thaliana T19C21.5 

complete cDNA, complete cds, EST hits 

potential frame shift at Bp 1241 according to CAR1 

but frame shift might be in CAR1 sequence! 

ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map='*939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos. 3316, no polyadenylation signal found 



1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 
51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 
101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 
151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 
201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 
251 CACCTCAGCG AGAGCAGCAG CAGCGATAGC AGCCGCAGAA GAGCCAGCGG 
301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 
351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 
401 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 
451 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 
501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 
551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 
601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGGAATCAT CCTGATGATG 
651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 
701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 
751 CCAGTACTGC TACTGCAATC ACAATCCAAA GGGATTGGAT TGTTGTTGTT 
801 GCAGGAGAAG ACAGAAGCAA ACTAGCAAAT ATGAATGCCA CAATACGAAG 
851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 
901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 
951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 
1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 
1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 
1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 
1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 
1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 
1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 
1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 
1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 
1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 
1451 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 
1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 
1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 
1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 
1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGCAGGCGTC 
1701 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 
1751 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 
1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 
1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 
1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 
1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 
2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 
2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 
2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 
2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 
2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 
2251 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 
2301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 
2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 
2401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 
24 51 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 
2501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACATGTATG 
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2551 TGGAAAAACA AAACACTTAA CTAGAATTCT CTAATAAGGT TTATGGTTTA 
2601 GCTTAAAGAG CACCTTTGTA TTTTTATTAT CAGATGGGGC AACATATTGT 
2651 ATGAAGCATA TGTAGCACTT CACAGCATGG TTATCATGTA AGCTGCAGGT 
2701 AGAAGCAAAG CTGTAAAGTA GATTTATCAC ACAATGACTG CATACAGACT 
2751 TCAAATATGT CAATAGTTTG GTCATAGAAC CTAGAAGCCA AAAGCCACAC 
2801 AGAAGGGCAA GAATCCCAAT TTAACTCATG TTATCATCAT TAGTGATCTG 
2851 TGTTGTAGAA CATGAGGGTG TAAGCCTTCA GCCTGGCAAG TTACATGTAG 
2901 AAAGCCCACA CTTGTGAAGG TTTTGTTTTA CAAATCACTT GATTTAACAC 
2951 ACTCAGGTAG AATATTTTTA TTTTTACTGT TTTATACCCA GAAGTTATTT 
3001 CTACATTGTT CTACAGCAAG AATATTCATA AAAGTATCCC TTTCAAATGC 
3051 CTTTGAGAAG AATAGAAGAA AAAAAGTTTG TATATATTTT AAAAAATTGT 
3101 TTTAAAAGTC AGTTTGCAAC ATGTCTGTAC CAAGATGGTA CTTTGCCTTA 
3151 ACCGTTTATA TGCACTTTCA TGGAGACTGC AATACGTTGC TATGAGCACT 
3201 TTCTTTATCC TTGGAGTTTA ATCCTTTGCT TCATCTTTCT ACAGTATGAC 
3251 ATAATGATTT GCTATGTTGT AAAATCTTTG TAAAAAATTT CTATATAAAA 
3301 ATATTTTGAA AATCTTAAAA AAAAAAAAAA AAA 



BLAST Results 



Entry HS389210 from database EMBL: 
human STS SHGC-10164. 
Score = 1592, P » 1.5e-64, identities - 346/364 

Entry HS933343 from database EMBL: 
human STS WI-16551. 
Score - 1193, P = 5.7e-46, identities » 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 



1 MTRAGDHNRQ RGCCGSLADY 
51 VELYGNSLLL TAVYGLWAG 
101 VSVILCGIIL MMVFLHKHEL 
151 AITIQRDWIV VVAGEDRSKL 
201 PVIGCGFISG WNLVSMCVEY 
251 LHKDTEPKPL EGTHLHGVKD 
301 SYYNQPVFLA GMGLAFLYMT 
351 ITGIMGTVAF TWLRRKCGLV 
401 SVSPFEDIRS RFIQGESITP 
451 PIISVSLLFA GVIAARIGLW 
501 YLLDLLHFIM VILAPNPEAF 
551 FACGPDAKEV RKENQANTSV 



LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 
SVLVLGAIIG DWVDKNARLK VAQTSLVVQN 
LTMYHGWVLT SCYILIITIA NIANLASTAT 
ANMNATIRRI DQLTNILAPM AVGQIMTFGS 
VLLWKVYQKT PALAVKAGLK EEETELKQLN 
SNIHELEHEQ EPTCASQMAE PFRTFRDGWV 
VLGFDCITTG YAYTQGLSGS I LS I LMGASA 
RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 
TKIPEITTEI YMSNGSNSAN IVPETSPESV 
SFDLTVTQLL QENVIESERG IINGVQNSMN 
GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 
V 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_24 j 6, frame 3 

TREMBLNEW:U76714_1 gene: "CARl"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARl) mRNA, complete cds., N 

- 1, Score « 1472, P = 7.2e-151 

TREMBL:AC004 683_5 gene: M T19C21.5"; Arabidopsis thaliana chromosome II 
BAC T19C21 genomic sequence, complete sequence., N • 2, Score -437, p 

- 2.8e-60 



TREMBL:AF03904 6_2 gene: "R09B5.4"; Caenorhabditis elegans cosmid 
R09B5., N - 2, Score - 323, P - 1.5e-43 



>TREMBLNEW:U76714_1 gene: "CARl**; product: "cell adhesion regulator"; 

Rattus norvegicus cell adhesion regulator (CARl) mRNA, complete cds. 
Length - 405 



537 



WO 01/12659 



PCT/IBOO/01496 



HSPs: 

Score - 1472 (220.9 bits), Expect = 7.2e-151, P « 7.2e-151 
Identities = 288/319 (90%), Positives - 297/319 (93%) 



Query: 


1 


MTPJU5DHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 




MT++ D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 




Sbjct: 


1 


MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 


Query: 


61 


TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKHEL 


120 




TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHK+EL 




Sbjct : 


61 


TAVYGLWAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKNEL 


120 


Query: 


121 


LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 


180 




L MYHGWVLT CYILIITIANIANLASTATAITIQRDWIVVVAGE+RS+LA+MNATIRRI 




Sbjct: 


121 


LNM YHGWV LT VC YILIITIANI ANLAS TAT A I T I QRDW I WVAGENRS R LA DMN AT I RR I 


180 


Query: 


181 


DQLTNIIAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 


240 




DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEY LLWKVYQKTPALAVKA LK 




Sbjct: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 


240 


Query: 


241 


EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 


300 




EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+AEPFRTFRDGWV 




Sbjct: 


241 


VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 


300 


Query : 


301 


S Y YNQ P V FLAGMG LA F - L Y 318 








SYYNQPVFL G F LY 




Sbjct: 


301 


SYYNQPVFLGWHGPGFPLY 319 





Pedant information for DKFZphutel_24 j 6, frame 3 



Report for DKFZphutel_24 j6. 3 



(LENGTH] 
[MW] 
tpU 
[HOMOL] 

norvegicus 

(BLOCKS] 

(PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 



571 

62542.72 
6.08 

TREMBL:U76714_1 gene: "CAR1"; product: "cell adhesion regulator" 
cell adhesion regulator (CARD mRNA, complete cds . le-141 
BL00341D 

MYRISTYL 15 
MITOCH_CARRIER 1 
CK2_PHOSPHO_SITE 6 
PROKAR_LIPOPROTEIN 1 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 4 
Laminin B (Domain IV) 
TRANSMEMBRANE 4 
LOW COMPLEXITY 8.7 6 % 



SEQ MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL * 

SEG 

PRD ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 

MEM MMMMMMMMMMMMM 



SEQ TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKHEL 

SEG .xxxxxxxxxxxxxxxx 

PRD ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWI VVVAGEDRSKLANMNATIRRI 

SEG XXXXXXXXXXXXXXXXXXXXX 

PRD hhcccccchhhhhhhhhhhhhhhhhhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 

MEM MMMMMMM 

SEQ DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 

SEG 

PRD hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 

MEM 



SEQ EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 

SEG 

PRD hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 

MEM 



SEQ SYYNQPVFLAGMGLAFLYMTVLGFDCITTGYAYTQGLSGSILSILMGASAITGIMGTVAF 

SEG 

PRD eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhhhh 



538 



WO 01/12659 



PCT/IB00/01496 



MEM 

SEQ TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITP 

SEG xxx 

PRD hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 

MEM 

SEQ TKIPEITTEI YMSNGSNSANIVPETSPESVPIISVSLLFAGVIAARIGLWSFDLTVTQLL 

SEG xxxxxxxxxx 

PRD ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QENVIESERGIINGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVSFVAMGHIMYFR 

SEG 

PRD hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ FAQNT LGN KL FACG P DA KEVRKE NQANT S VV 

SEG 

PRD eecccccceeeeccccchhhhhhhhcccccc 

MEM 



Prosite for DKF2phutel_24 j 6 . 3 



PS00001 


100- 


■>104 


ASN GLYCOSVLATION 


PDOC00001 


PS00001 


174- 


>178 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


434- 


>438 


ASN~GLYCOSYLATION 


PDOC00001 


PS00OO1 


567- 


>571 


ASM GLYCOSYLATION 


PDOC00001 


PS00005 


23 


:->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


176- 


>179 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


294- 


>297 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


487- 


>490 


PKC PHOSPHORITE 


PDOC00005 


PS000O6 


16 


;->2o 


CK2 PHOSPHO SITE 


PDOC000O6 


PS00006 


36 


;->40 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


294- 


>298 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00OO6 


396- 


>400 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


403- 


>407 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


445- 


>449 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


12 


->18 


MYRISTYL 


PDOC00008 


PS00OO8 


65 


->71 


MYRISTYL 


PDOC00008 


PS0OOO8 


76 


->82 


MYRISTYL 


PDOC00008 


PS00008 


193- 


>199 


MYRISTYL 


PDOC00008 


PS00008 


267- 


>273 


MYRISTYL 


PDOC00008 


PS00008 


311- 


>317 


MYRISTYL 


PDOC00008 


PS00008 


336- 


>342 


MYRISTYL 


PDOC00008 


PS00008 


339- 


>345 


MYRISTYL 


PDOC00008 


PS00008 


353- 


>359 


MYRISTYL 


PDOC00008 


PS00008 


368- 


>374 


MYRISTYL 


PDOC00008 


PS00008 


373- 


>379 


MYRISTYL 


PDOC00008 


PS00008 


435- 


>441 


MYRISTYL 


PDOC00008 


PS00008 


461- 


>467 


MYRISTYL 


PDOC00008 


PS00008 


490- 


>496 


MYRISTYL 


PDOC00008 


PS00008 


494- 


>500 


MYRISTYL 


PDOC00008 


PS00013 


122- 


>133 


PROKAR LIPOPROTEIN 


PDOC00013 


PS00215 


404- 


>414 


MITOCH~CARRIER 


PDOC00189 



Pfam for DKFZphutel_24 j 6. 3 



HMM NAME Laminin B (Domain IV) 

HMM * YWR1 PERFLGDQvTs YGGkLe* 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNT LGN KL FACG P DAK 



539 



WO 01/12659 



PCT/IBOO/01496 



DKF2phutel_2h3 



group: differentiation/development 

DKFZphutel_2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site (CAAX box) and seems to be post- 
translationally modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-osteogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 



strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2033 bp 

Poly A stretch at pos. 2007, polyadenylation signal at pos. 1986 



1 GGACCGAGGC TGCACCGGCA GAGGCTGCGG GGCGGACGCG CGGGCCGGCG 
51 CAGCCATGGT GAAGATTAGC TTCCAGCCCG CCGTGGCTGG CATCAAGGGC 
101 GACAAGGCTG ACAAGGCGTC GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 
151 CGAGATCCTG CTGACGCCGG CTAGGGAGGA GCAGCCCCCA CAACATCGAT 
201 CCAAGAGGGG GAGCTCAGTG GGCGGCGTGT GCTACCTGTC GATGGGCATG 
251 GTCGTGCTGC TCATGGGCCT CGTGTTCGCC TCTGTCTACA TCTACAGATA 
301 CTTCTTTCTT GCACAGCTGG CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 
351 TGTATGAGGA CTCCCTGTCC TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 
401 GAGGATGTGA AAATCTACCT CGACGAGAAC TACGAGCGCA TCAACGTGCC 
451 TGTGCCCCAG TTTGGCGGCG GTGACCCTGC AGACATCATC CATGACTTCC 
501 AGCGGGGTCT GACTGCGTAC CATGATATCT CCCTGGACAA GTGCTATGTC 
551 ATCGAACTCA ACACCACCAT TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 
601 CCTCATGAAC GTGAAGAGGG GGACCTACCT GCCGCAGACG TACATCATCC 
651 AGGAGGAGAT GGTGGTCACG GAGCATGTCA GTGACAAGGA GGCCCTGGGG 
701 TCCTTCATCT ACCACCTGTG CAACGGGAAA GACACCTACC GGCTCCGGCG 
751 CCGGGCAACG CGGAGGCGGA TCAACAAGCG TGGGGCCAAG AACTGCAATG 
801 CCATCCGCCA CTTCGAGAAC ACCTTCGTGG TGGAGACGCT CATCTGCGGG 
851 GTGGTGTGAG GCCCTCCTCC CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
901 TTCTTTCCAG CTGCTCTCTG GCCCTCCTCC TTCCCCCTGC TTAGCTTGTA 
951 CTTTGGACGC GTTTCTATAG AGGTGACATG TCTCTCCATT CCTCTCCAAC 
1001 CCTGCCCACC TCCCTGTACC AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 
1051 CTCTGCTGAC CTGGGTGTGG CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 
1101 TCTGTGTCCC ACTGTCTTGA AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 
1151 CTGCACCGGC AGCCCAAGGG GAAGGACCGG TTGGGGGAGC CGGGCATGTG 
1201 AGGCCCTGGG CAAGGGGATG GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 
1251 AGAAGTATCT GCACAATTAG AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 
1301 TACACTTTCT TCACTGTCCC TATTCCTAGA CCTGGGGCTT GAGCTGAGGA 
1351 TGGGACGATG TGCCCAGGGA GGGACCCACC AGAGCACAAG AGAAGGTGGC 
1401 TACCTGGGGG TGTCCCAGGG ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 
14 51 GAGCTTGGAG TTTGGGGAGT GGGGATGAGT CCGTCAAGCA CAACTGTTCT 
1501 CTGAGTGGAA CCAAAGAAGC AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 
1551 AGGAGCACAA GCAGGGTCCC CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 
1601 GGAACGGGGC AGGCAAGGTC ACTGCTCAGT CACGTCCACG GGGGACGAGC 
1651 CGTGGGTTCT GCTGAGTAGG TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 
1701 CTGTTTTGAA AGATAACACA GAGGGAAAGG GAGAGCCACC TGGTACTTGT 
1751 CCACCCTGCC TCCTCTGTTC TGAAATTCCA TCCCCCTCAG CTTAGGGGAA 
1801 TGCACCTTTT TCCCTTTCCT TCTCACTTTT GCATGTTTTT ACTGATCATT 
1851 CGATATGCTA ACCGTTCTCA GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 
1901 GCCTTCAGTC AGTCTCTGGG GATGAAACTC TTAAATGCTT TGTATATTTT 
1951 CTCAATTAGA TCTCTTTTCA GAAGTGTCTA TAGAACAATA AAAATCTTTT 
2001 ACTTCTGAAA AAAAAAAAAA AAAAGGGCGG CCG 



BLAST Results 



Entry B64417 from database EMBL: 

CIT-HSP-2023A7.TR CIT-HSP Homo sapiens genomic clone 2023A7 . 
Length = 715 
Plus Strand HSPs: 



540 



WO 01/12659 



PCT/IBOO/01496 



Score - 1546 (232.0 bits), Expect = 7.8e-64, P = 7.8e-64 
Identities « 310/311 (99%) 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
multigene family of 

integral membrane proteins. 



Peptide information for frame 2 



ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGSSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFWE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_2h3, frame 2 

SWISSNEW:ITMB CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N - T, Score - 573, P = 1.3e-55 

SWISSNEW:ITMB MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN) . , N = 
1, Score = 560, P » 3.2e-54 

SWISSNEW : ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN)., N 53 1 , 
Score - 456, P * 3.3e-43 



>SWISSNEW:ITMB CHICK INTEGRAL MEMBRANE PROTEIN 2B ( TRANSMEMBRANE PROTEIN 
E3-16) . 

Length = 262 

HSPs: 



Score = 573 (86.0 bits), Expect = 1.3e-55, P = 1.3e-55 
Identities - 117/264 (44%), Positives - 172/264 (65%) 



Query: 


1 


MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 


60 




MVK+SF A+A + A+K ++ ++L+ P ++P G 




Sbjct: 


1 


MVKVSFNSALA — HKEAANKEEENS QVLILPPDAKEPEDVVVPAGHKRAWCWC 


51 


Query: 


61 


LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM-- 


112 




+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 




Sbjct: 


52 


MC FGLAFMLAG V I LGGA Y L Y K Y FA FQQ GGVYFCGIKYIEDGLSLPESGAQLKSARYH 


108 


Query: 


113 


ELEEDVKIYLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTI 


172 




+E++++I + E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 




Sbjct: 


109 


TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTSV 


168 


Query: 


173 


VLPPRNFWELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRR 


232 




V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 




Sbjct: 


169 


VMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQR 


228 


Query: 


233 


RATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 






+ + I KR A NC IRHFEN F +ETLIC 




Sbjct: 


229 


KEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 








Pedant information for DKFZphutel_2h3, frame 2 





541 



WO 01/12659 
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Report for DKFZphutel_2h3 . 2 



[LENGTH J 
[MM J 
[pi] 
[HOMOL] 
le-49 
[PROSITE] 
I PROS I TE] 
I PROSITE] 
{ PROSITE J 
I PROSITE) 
t PROSITE} 
[PROSITE J 
[KW] 
[KW] 



267 

30253.96 
8.16 

SWISSNEW:ITMB CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3- 



MYRISTYL 4 
PRENYLATION 1 
CAMP_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
A S N_GL YCOS YLAT I ON 
TRANSMEMBRANE 1 
LOW COMPLEXITY 



3 
3 
1 
4 
1 

15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMM 

SEQ LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . . xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMWTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAI RHFENTFVVETLICGVV 

SEG xx 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphutel_2h3 . 2 



PS00001 


169->173 


PS00004 


50->54 


PS00004 


187->191 


PS00004 


232->236 


PS00005 


49->52 


PS00005 


209->212 


PS00005 


227->230 


PS00005 


235->238 


PS00006 


30->34 


PS00006 


110->114 


PS00006 


209->213 


PS00007 


119->127 


PS00008 


52->58 


PS00008 


71->77 


PS00008 


138->144 


PS00008 


243->249 


PS00294 


264->268 



ASN_GL YCOS YLAT ION 

CAMP PHOSPHO_SITE 

CAMP~PHOSPHO_SITE 

CAMP~PHOS PHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

PRENYLATION 



PDOC0OO01 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00266 



(No Pfam data available for DKFZphutel_2h3 . 2) 



542 



WO 01/12659 



PCT/IBOO/01496 



DKFZphmcfl_lall 



group: transmembrane protein 

DKFZphmcf l_lall encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29A3_3 protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pffam or SCOP motife. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells. 



similarity to YDR255c and SPBC29A3.03c 
membrane regions: 1 

Summary DKFZphmcf l_lall encodes a novel 393 amino acid protein, with 
similarity to YDR25Sc and SPBC29A3 -03c. 



similarity to YDR255C and SPBC29A3.03C 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 



Sequenced by DKFZ 

Locus: /map=**542.7 cR from top of Chr5 linkage group" 
Insert length: 1819 bp 

Poly A stretch at pos. 1808, no polyadenylation signal found 



1 CCCGGCCCAG CCCCCGAAGA 
51 TCAAACGTCC AGTCCTCGTG 
101 GAGGCCACCA TGGAGCAGTG 
151 CCTGCAGAAG TTCCTGACCT 
201 AGCTGCTGCA CTACGTGGGC 
251 CTCCAGGGGA CCCCTCTCTC 
301 CTGCCGGAAG ATCAAAGATA 
351 ACATTCACAG CAGTGTATCC 
401 GACTCTGAGA TCTGTGGTGT 
451 ACAGCAGCAG CAGATCCTGC 
501 AGGGCATGCT CAGCGTGGCC 
551 GTGGACTTGG ATTTCAAGCA 
601 AGCCCTGCAC GAACAAGACC 
651 ACAGGCAGCG CCTGCTGGAA 
701 CGACTGCACT TCATCCGCCT 
751 GGCCCTCAGC TATGCTCGGC 
801 GGGAGATCCA GGTGATGATG 
851 GAGAAGTCAC CCTACTGCCA 
901 CTGTGAGACC TTTACCCGGG 
951 AGTCCCCCCT TAGCGTCAGC 
1001 TTGATGAACA TCAAGGCTGT 
1051 GAATCACAAG GACGAGTTAC 
1101 GGTACCACTC CGTGTTCGCT 
1151 TCCAACCCTC CCATCAAGCT 
1201 ACTCAATAAG CTCATTAATG 
1251 TGGAGCAGAA CCCGGCAGAT 
1301 GGAAGGAATT TTGTTGAAAG 
1351 CGGTAGGGTG GTCAACTTCA 
14 01 CTGAGGAGTT CCACTGAGGG 
1451 GAGGAGGGAG ATGGACCAGC 
1501 GGAAAGGGAG ATGCTGGCCT 
1551 TTTGCGTTTG ACTTAGTAGC 
1601 CAGCAGTAGA CATCCTTCCA 
1651 ATGCCAATGC TATGTCCACC 
1701 TGGCCCACCT CTTCCTCCCA 
1751 ACTGTAAATA GTCCCAGTTA 
1801 ACAAATGTAA AAAAAAAAA 



GCCGCCTCAG CCGGGGGGAG TTGCTCGGAC 
CGACCGCGCT GGGTCGGAAG TGAGCAGGCT 
TGCGTGCGTG GAGAGAGAGC TGGACAAGGT 
ACGGGCAGCA CTGTGAGCGG AGCCTGGAGG 
CAGCTGCGGG CTGAGCTGGC CAGCGCAGCC 
AGCCACCCTC TCTCTGGTGA TGTCACAGTG 
CGGTGCAGAA ACTGGCTTCG GACCATAAGG 
CGAGTGGGCA AAGCCATTGA CAGGAACTTC 
TGTGTCAGAT GCGGTGTGGG ACGCGCGGGA 
AGATGGuCAT CGTGGAACAC CTGTATCAGC 
GAGGAGCTGT GCCAGGAATC AACGCTGAAT 
GCCTTTCCTA GAGTTGAATC GAATCCTGGA 
TGGGTCCTGC GTTGGAATGG GCCGTCTCCC 
CTCAACAGCT CCCTGGAGTT CAAGCTGCAC 
CTTGGCAGGA GGCCCCGCGA AGCAGCTGGA 
ACTTCCAGCC CTTTGCTCGG CTGCACCAGC 
GGCAGCCTGG TGTACCTGCG GCTGGGCTTG 
CCTGCTGGAC AGCAGCCACT GGGCAGAGAT 
ACGCCTGTTC CCTGCTGGGG CTTTCTGTGG 
TTTGCCTCTG GCTGTGTGGC GCTGCCTGTG 
GATTGAGCAG CGGCAGTGCA CTGGGGTCTG 
CGATTGAGAT TGAACTAGGC ATGAAGTGCT 
TGCCCCATCC TCCGCCAGCA GACGTCAGAT 
CATCTGTGGC CATGTTATCT CCCGAGATGC 
GAGGAAAGCT GAAGTGTCCC TACTGTCCCA 
GGGAAACGCA TCATATTCTG ATTCCTACCT 
GGGTTTTCAC CTGTGAGCCT TGGTCTGTCT 
GTGGACTGTG GTTGGTTTCA GAGCGCCTGG 
GAGCACTGGA GCAGCCCTTT GGCAGAGGCT 
CCACGCCTGG CACCTGGCTC CATGGCATAA 
CTGTGCTCCT GCTGTCTTTT CCTGTTTCTG 
AACCGACAGA GTGGCAAGGG ATTTGGTCTT 
CCCCTGCCCT CAGCCAAGTC TCTTGCTGCC 
CTTGCCCCTC GGCCCAAGAG TGTCCAGCGG 
CTACAGCCTC AACAGTATGT ACCATCTCCC 
GAACGGAATG CCGTTGTTTT ATAACTTTGA 



BLAST Results 



Entry HS579359 from database EMBL: 
human STS WI-6350. 
Score - 1027, P - 9.9e-40, identities - 207/209 



543 



WO 01/12659 



PCT/IBOO/01496 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 1288 bp; peptide length: 393 
Category: similarity to unknown protein 



1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 
51 TPLSATLSLV MSQCCRKIKD TVQKLASDHK DIHSSVSRVG KAIDRNFDSE 
101 ICGVVSDAVH DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQESTLNVDL 
151 DFKQPFLELN RILEALHEQD LGPALEWAVS HRQRLLELNS SLEFKLHRLH 
201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 
251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVALPVLMN 
301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 
351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR I IF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lall , frame 2 

TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid c29A3., N = 2, Score = 302, P = 
3.4e-42 

PIR:S67312 probable membrane protein YDR255C - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 271, p = 5.3e-22 

TREMBL : CET07D1_2 gene: "T07D1.2"; Caenorhabditis elegans cosmid 
T07D1., N = 1, Score = 193, P = 5.6e-13 



>TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c29A3. 
Length = 398 

HSPs: 



Score = 302 {45.3 bits), Expect - 3.4e-42, Sum P{2) - 3.4e-42 
Identities - 55/142 (38%) , Positives - 89/142 (62%) 



Query: 


252 


YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 


311 




Y +LD W + F R+ C+ LG+S+ESPL + +G +ALP+L+ + ++++++ 




Sbjct : 


258 


YIDVLDLD-WKSLELLFVREFCAALGMSLESPLDIVVNAGAIALPILLKMSSIMKKKHTE 


316 


Query: 


312 


GVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVISRDALNKLING 


371 




W + ELP+EI L +HSVF CP+ ++Q ++ NPP+ + CGHVI + ++L +L 




Sbjct: 


317 


— WTSQGELPVEIFLPSSYHFHSVFTCPVSKEQATEENPPMMMSCGHVI VKESLRQLSRN 


374 


Query: 


372 


G--KLKCPYCPMEQNPADGKRIIF 393 








G + KCPYCP E AD R+ F 




Sbjct: 


375 


GSQRFKCPYCPNENVAADAIRVYF 398 




Score 


= 161 


(24.2 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities - 51/221 (23%), Positives - 102/221 (46%) 




Query: 


22 


GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 


81 






G C L EL +++L+P++LVCK+ L K 




Sbjct: 


15 


GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA — CSEKTQQVFDDLKRTEKK 


67 


Query : 


82 


IHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQILQMAIVEHLYQQGMLSVAEELC 


141 




H+S++R GK +++ F+ ++ + + +++++++ + A+ H ++QG + +A C 




Sbjct: 


68 


FHTSLNRFGKTLEKKFNFDLEDIKLHSSFESKKRE IDTALSLHFFRQGDVELAHLFC 


124 


Query: 


142 


QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 


201 






+E+ + + F L I++ + ++DL +EWA R L SSLE+ L + 




Sbjct: 


125 


KEAGIEEPSESLHVFTLLKSIVQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 


184 


Query: 


202 


IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 242 





+ K+A+YR+ F+H +IQ M +L + 
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SbjCt: 185 VSNYL — TTKDIMAAIRYCRTNMAEFQKKHLADIQKTMIALFF 225 
Pedant information for DKFZphmcf l_lall , frame 2 
Report for DKFZphmcf l_lall .2 



(LENGTH] 393 

(MW] 44414.77 

[pi] 6.15 

[HOMOL] TREMBL:SPBC29A3_3 gene: 
S.pombe chromosome II cosmid c29A3. 2e-39 

[FUNCAT] 99 unclassified proteins 

[PIRKWJ transmembrane protein 2e-21 

[PROSITE] MYRISTYL 2 

[PROSITE] AMI DAT ION 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[ PROSITE] TYR_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GL YCO S Y LAT I ON 1 

(KW) TRANSMEMBRANE 1 



SPBC29A3.03c"; product: "hypothetical protein" 
[S. cerevisiae, YDR255c] 6e-23 



SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKIKDTVQKLASDHKDIHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhccccceeeechhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYQQGMLSVAEELCQESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLHFIRLLAGGPAKQLEALSYARHFQPFARLHQREIQVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 

PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IKAVIEQRQCTGVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhccccccccccccceee 

MEM MMMMMM 

SEQ S RDALN KLI NGGKLKC P YC PMEQNPADGKRI I F 

PRD eehhhhhhhccccccccccccccchhhhhcccc 

MEM 



Prosite for DKFZphmcf l_lall . 2 



PS00001 


189->193 


PS00005 


180->183 


PS00006 


28->32 


PS00006 


135->139 


PS00006 


190->194 


PS00007 


211->219 


PS00007 


27->36 


PS00007 


244->253 


PS00008 


37->43 


PS00008 


50->56 


PS00009 


387->391 


PS00013 


282->293 



ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

TYR PHOSPHO_SITE 

TYR~PHOSPHO_SITE 

TYR~PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

AMIDATION 

PROKAR LIPOPROTEIN 



PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00013 



(No Pfam data available for DKFZphmcf l_lall . 2) 
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DKFZphmcfl_lc23 



group: mammary carcinoma derived 

DKFZphmcf l_lc23. 1 encodes a novel 311 amino acid proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 



unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3077 bp 

Poly A stretch at pos. 3067, polyadenylation signal at pos. 3048 



1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 
101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 
151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 
201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 
251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 
301 GGGCCTCCCA GGGAGGACGT AGGTGCGCCC CTGGTCACGC CCTCGCTCCT 
351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 
401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 
451 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 
501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 
551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 
601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 
651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 
701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 
7 51 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 
801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 
851 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 
901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 
951 CCCGGGCTCA GACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 
1351 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 
1401 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
14 51 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGGGGCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
1851 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
2451 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 
2801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 
2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAAAAGC CTCTGATTTT 
2901 TAATTTCCAT AAAATGTTAG AAGTATATAT AT AC AT AT AT ATATTTCTTT 
2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 
3001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 49 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 



1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 
51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 
101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGLHAA 
151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 
201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 
2 51 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23, frame 1 

PIR:S49915 extensin-like protein - maize, N = 1, Score = 215, P - 
6.1e-15 

PIR:A28996 proline-rich protein M14 precursor - mouse, N = 1, Score = 
191, P = 3.8e-13 



>PIR:S49915 extensin-like protein - maize 
Length = 1,188 

HSPs: 



Score = 215 (32.3 bits), Expect = 6.1e-15, P = 6.1e-15 
Identities - 81/269 (30%), Positives - 115/269 (42%) 



Query: 


5 


PPPEEAFFS VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPP A 


55 




ppp S V SP P P SP PA +SS ++ PP +P PPP + 




Sbjct: 


598 


PPPPAPVASPPPPVKSPPPPTPVASPP PPAPVASSPPPMKSPPPPTPVSSPPPPEKS 


654 


Query: 


56 


PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 


115 




PPPPASP + P P K PP++P+PS + P 




Sbjct: 


655 


PPPPPPAKSTPPP-EEYPT—PPTSVKSSPPPEKSLPPPTLIPSPPPQEKPTPPSTPSKP 


711 


Query: 


116 


PTPALGPSAPQKPLRRA-LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 


174 




P+ PS P++P+ + ++SP PAP S +LA S + + PP 




Sbjct: 


712 


PSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPPPTPVSSPPALAPVSSPPSVKSSPPPA 


771 


Query: 


175 


AEPRPPQSPASTASFIFSKGSRKLQLERPV-SPETQADLQRNLVAELRSISEQRPPQAPK 


233 




PP +P +S +Q+ P +P++ L V+ + + PP AP 




Sbjct: 


772 


PLSSPPPAPQVKSS PPPVQVSSPPPAPKSSPPLAP— VSSPPQVEKTSPPPAPL 


823 


Query: 


234 


KS PKAPP PVARKPSVGV— PP PAS PS YPRAEPLTAPPTNGLP 273 






SP P + P V V PPP S P P+++PP P 




Sbjct: 


824 


SSPPLAPK-SSPPHVVVSSPPPWKSSPPPAPvSSPPLTPKP 864 




Score 


=- 206 


[30.9 bits), Expect - 9.1e-14, P - 9.1e-14 
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Identities = 82/261 (31%), Positives = 108/261 (41%) 



Query : 


17 


PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 


69 




P P G P SP + PAAS+ ST + P P+P P P P P P +P 




Sbjct: 


410 


PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 


4 68 


Query: 


70 


AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 


128 




+ P PV G S P V P + +V+L AP G+P P + + + P P 




Sbjct: 


4 69 


DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 


528 


Query: 


129 


LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 


188 




+ G SP P P S + +K+ AG + P PPE P PP AS 




Sbjct: 


529 


j GSPSP-PPPVSWSPPPPVKSPPPPAPVG SPP — PPEKSPPPPAPVASPPP 


577 


Query: 


169 


FIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVARKPS- 


247 




+ S L P P ++ VA + PP P SP P PVA P 




Sbjct: 


578 


PVKSPPPPTLVASPP--PPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 


635 


Query: 


248 


VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 








+ PPP +SP P P PP P + + 




Sbjct: 


636 


MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 




Score 


= 202 


(30.3 bits), Expect - 2.9e-13, P - 2.9e-13 




Identities = 81/254 (31%), Positives - 110/254 (43%) 




Query: 


16 


SPEPAGPSGSPELV--SSP — AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 


70 




SP PA P SP L SSP SS ++ PP +P PP P PA S P HV+ 




Sbjct: 


817 


SPPPA-PLSSPPLAPKSSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKPA SPPAHVS 


872 


Query: 


71 


KLPQ KEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQ 


126 




p+ P + PP E +p TP L ++S P +P + P + 




Sbjct: 


873 


SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 


932 


Query: 


127 


KPLRRAL SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 


183 




P+ + + ++SP PAP S A K+ A L P PPE + PP +P 




Sbjct: 


933 


PPVVVSSPPPTVKSSPPPAPVSSPPATP— KSSPPPAPVNL P— PPEVKSSPPPTP 


984 


Query: 


184 


ASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVA 


243 




S+ + P PE ++ V+ + PP AP SP PPPV 




Sbjct: 


985 


VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPVK 


104; 


Query: 


244 


RKPS VGVPPPASPSYPRAEPLTAPP 268 






P V PPP S P P+++PP 




Sbjct : 


1043 


SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 




Score 


- 190 


(28.5 bits), Expect - 7.9e-12, P - 7.9e-12 




Identities - 74/264 (28%), Positives - 111/264 (42%) 




Query: 


5 


PPPEEAFFSVASPEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 


63 




PPP S PE + P P +P + T+++ PP PP P+P 




Sbjct: 


639 


PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 


698 


Query: 


64 


SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 


123 




P K P K PP+E V +P TP V +P PTP P 




Sbjct: 


699 


QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK — SSPPPAPVSSP--PPTPVSSPP 


753 


Query : 


124 


APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 


183 




A P+ S ++SP PAP S A + + K+ + + + P PP + PP +P 




Sbjct: 


754 


A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP--PPAPKSSPPLAP 


806 


Query: 


184 


ASTASFIFSKGSRKLQLERP-VSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPV 


242 




S+ + LP + + P++ +V+ + + PP AP SP P 




Sbjct: 


807 


VSSPPQVEKTSPPPAPLSSPPLAPKSSPP--HVVVSSPPPVVKSSPPPAPVSSPPLTPKP 


864 


Query: 


243 


ARKPS-VGVPP PASPSYPR AEPLTAPP 268 






A P+ V PP P++P P +EP ++PP 




Sbjct: 


865 


ASPPAHVSSPPEVVKPSTPPAPTTVISPPSEPKSSPP 901 




Score 


= 189 


(28.4 bits), Expect = l.Oe-11, P - 1.0e-ll 




Identities - 86/271 (31%), Positives = 112/271 (41%) 




Query: 


5 


PPPEEAFFSVASPEPAGPSGSPEL-VSSP — AASSSSATALQIQPPG--SPDPPPAP 


56 




PPP A S P P S P + VSSP A SS A PP PPPAP 




Sbjct: 


768 


PPP--APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 


825 


Query: 


57 


PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAP 


116 




P AP SS P V P PV S PP V +P +TP V +P 




Sbjct: 


826 


PPLAPKSSPPHVVVSSPP — PVVKSS PPPAPVSSPPLTPKPASPPA--HVSSPPEVV 


878 


Query: 


117 


TPALGPSAPQKPLRRALSGRAS.PVPAPSSGLHAAVRLKAC-SLAASEGL SSAQP 


169 



P+ P AP + ++SP P P S V+ ++ +S + SS P 
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SbjCt: 879 KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPWV 937 

Query: 170 -NGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 228 

+ PP + PP +P S+ + P PE ++ V+ + P 

Sbjct: 938 SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 996 

Query: 229 PQAPKKSPKAPPPVARKPS VGVPPPASPSVPRAEPLTAPP 268 

P AP SP PPP + P V PPP S P P+++PP 

SbjCt: 997 PPAPMSSP— PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 

Score = 181 (27.2 bits), Expect - 8.8e-ll, P - 8.8e-ll 
Identities = 73/277 (26%), Positives - 105/277 (37%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKL PQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGA 111 

PPAP + SPV++ PKP + GPP+ P P ++S 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 584 

Query: 112 PG--GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 166 

P +PP+ PP+ + PPS AV ++ + 

SbjCt: 585 PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 644 

Query: 167 AQPNGPPEAEPRPPQSPASTASFI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQ 226 

PPE P PP PA + + ++ PE L+ + 

SbjCt: 645 VSSPPPPEKSP-PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 702 

Query: 227 RPPQAPKKSPKAPP-PVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP P K P +P P K V PP S P P+++PP 
SbjCt: 703 TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 745 

Score - 177 (26.6 bits), Expect - 2.6e-10, P = 2.6e-10 
Identities = 78/264 (29%), Positives = 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP — DPPPAP-- 56 

PPP + P+PA P S PE+V P+ + T I PP P PPP P 

Sbjct: 850 PPPAPVSSPPLTPKPASPPAHVSSPPEVVK-PSTPPAPTTV — ISPPSEPKSSPPPTPVS 906 

Query: 57 -PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

P P SS P + P .P PP V +P P++ V +P 

Sbjct: 907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPWVSSP — PPTVKSSPPPAPVSSPPAT 959 

Query: 116 PTPALGPSAPQKPLRRALSGRASPV PAPS SGLHAAVRLKACSLAASEGLSSAQPNGP PEA 175 

P + P+ * P ++SP PPS A + S +SS P PPE 

Sbjct: 960 PKSSPPPAPVNLPPPEV KSSPPPTPVSSPPPAPK SSPPPAPMSSP-P — PPEV 1009 

Query: 176 EPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKS 235 

+ PP +P S+ + p p ++ V+ + PP AP S 

Sbjct: 1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 

Query: 2 36 PKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P PPPV P V PPP S P P+++PP 
Sbjct: 1069 P— PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 

Score - 177 (26.6 bits), Expect = 2.6e-10, P = 2.6e-10 
Identities - 82/267 (30%), Positives - 110/267 (41%) 

Query: 17 PEPAG-PSGSPELVSSPAASS— -SSATALQIQPPGSPDPPPAP— -PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P PPP +P 
Sbjct: 410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 468 

Query: 70 AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+P PV G S P V P + +V+L AP G+P P + ++P P 

Sbjct: 469 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPvSVVSPPPPVKSPPPPAPVG SPP— PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPV— -SPETQADLQRNLVAELRS ISEQRPPQA PK 233 

+ S L P SPA+ + ++S ++ PP P 

Sbjct: 578 PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 636 

Query: 234 KSPKAPPPVARKP SVGVPPPASPSYPRAEPLTAPPTN 270 

KSP P PV+ P PPP + S P E PPT+ 

Sbjct: 637 KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTS 676 

Score - 170 (25.5 bits), Expect = 1.6e-09, P - 1.6e-09 
Identities - 78/279 (271), Positives = 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

pp S S + P +? + P SS A+ PP +P +PP P SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS--SPP-PVWSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG — GAPTPALGP 122 

P V P PV PP +P PL ++S P +P PA 

SbjCt: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG--RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

SbjCt: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS--P — PPPVKSPPP 104 6 

Query: 181 QS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 240 

+P S+ + P P ++ V+ + PP AP SP PP 

Sbjct: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PP 1103 

Query: 241 PVARKPS---VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA PS P P+ + + PP P + ++ L 
SbjCt: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score » 169 (25.4 bits). Expect = 2.1e-09, P = 2.1e-09 
Identities - 75/266 (28%), Positives - 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 4 69 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 
SbjCt: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP— PPPEKSPPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P P P P ++ P PAP + V+ S ++S P P + 

SbjCt: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP— PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK--PSVGVPPPASPSYPRA--EPLTAPP 2 68 

P +PPP + PS PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score - 168 (25.2 bits). Expect - 2.7e-09, P - 2.7e-09 
Identities - 75/267 (28%), Positives = 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

SbjCt: 496 ASTPPP— SLVKLSPPQAPVGSPPPPVKTTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

Sbjct: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

SbjCt: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 179 PP- -QS PASTAS FIFSKGSRKLQLERPV— SPETQADLQRNLVAELRS IS EQRPPQAPK 233 

PP+PSSKLPSPQ S ++P +P 

Sbjct: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP — SPP 721 

Query: 234 KSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
SbjCt: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits), Expect = 4.6e-09, P - 4.6e-09 
Identities - 81/268 (30%), Positives - 108/268 (40%) 

Query: 5 PPPEEAF FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP— 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 - - A P P A PAPAS SA PGH VAK L PQK E P VGC S KGGG P PRE DVGA P L VT P S LLQMVRL RS 108 

+PP PAP +S+P + P PV K PP P ++S 

SbjCt: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 679 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P +PPLPSP P + + ++P PSS + + S SS 
Sbjct: 680 SPPPEKSLPPPTLIPSPP— PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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Query: 168 QPNGPPEAEPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQR 227 
P PPSP + A+ SSK P+P+ ++ + 



P+A P V PP 



(24.8 bits), Expect = 6.0e-09, P - 6.0e-09 
■ 79/264 (29%), Positives - 105/264 (39%) 

PPPEEAFFSVASPEPAG-PSGSP--ELVSSPAASSSSATALQIQPPGSPDPPP-APPAPA 60 
ppp + + + P P G PS P +VS PS P GSP PP +PP PA 



Sbjct: 


737 


Query: 


228 


Sbjct: 


794 


Score 


- 165 


Identities < 


Query: 


5 


Sbjct: 


517 


Query: 


61 


Sbjct: 


571 


Query: 


115 


Sbjct: 


631 


Query: 


172 


Sbjct: 


689 


Query: 


232 


Sbjct: 


740 


Score 


- 162 


Identities « 


Query: 


2 


Sbjct: 


427 


Query: 


61 


Sbjct: 


487 


Query: 


119 


Sbjct: 


537 


Query: 


175 


Sbjct: 


595 


Query: 


235 


Sbjct: 


654 


Score 


- 159 


Identities - 


Query: 


5 


Sbjct: 


916 


Query: 


60 


Sbjct: 


967 


Query: 


120 


Sbjct: 


1025 


Query: 


176 


Sbjct: 


1085 


Query: 


236 


Sbjct: 


1136 


Score 


- 143 



- -GGPPREDVGAP LVTPSLLQMVRLRS VGAPGG 114 

PP V +P + +P V AP 



P SPE + + V+ + PP A 

-PSSPEKPSP-PKEPVSSPPQTPKSSPPPA 739 



KSPKAPPPVARKPSVGV— PPPASPSYPRAEPLTAPP 268 
SP P PV+ P++ PP+ S P PL++PP 



(24.3 bits), Expect - 1.3e-08, P - 1.3e-08 
76/272 (27%), Positives - 99/272 (36%) 



PP P P +PPA + 



GPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTP-- 118 

PP+ VG+P P . V+ S AP G+P+P 
-PPQAPVGSP — PPP VKTTSPPAPIGSPSPPP 536 



— ALGPSAPQK-PLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

+ PPKP AGSPP S A S ++PP 

PVSVVSPPPPVKSPPPPAPVG— SPPPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPP 594 



(23.9 bits), Expect - 2.8e-08, P = 2.8e-08 
- 77/264 (29%), Positives - 103/264 (39%) 

PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPPAP PAP 59 

PPP V+SP P P SP P SS ++ PP +P PP P P P 

PPPA MVSSP-PMTPKSSPP PVVVSSPPPTVKSSPPPAPVSSPPATPKSSPPP 966 



AP + P V 



++ P PAP S 



5 bits), Expect - 1.8e-06, P - 1.8e-06 
Identities - 59/179 (32%), Positives - 77/179 (43%) 



551 



WO 01/12659 



PCT/IB00/01496 



Query: 3 DFPPPEEAFFSVASPEP-AGPSGSPELVSSPAASSSSATA-LQIQPPGSP-- DPPP A 55 

+ PPPE S P P + P +P+ PA SS + + PP +P PPP + 

Sbjct: 970 NLPPPEVK--SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 1027 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PP PAP SS P V P PV PP + P S V+ AP + 

Sbjct: 1028 PPPPAPVSSPPPPVKSPPPPAPVSSPP PPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 116 PTPALGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

P P + P P+ ++ P PAP S A +K SL +SS P PP 

Sbjct: 1085 PPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPPAP-VKPPSLPPPAPVSS— P— PPV 1139 

Query: 175 AEPRPPQ 181 
P PP+ 

Sbjct: 1140 VTPAPPK 1146 

Score - 133 (20.0 bits), Expect - 2.3e-05, P = 2.3e-05 
Identities - 50/132 (37%), Positives - 59/132 (44%) 

Query: 1 MADFPPPEEAFFSVASPEPAGP-SGSPELVSSP— -AASSSSATALQIQPPGSP--DPPP 54 

M+ PPPE V SP P P S P V SP A SS + + PP +P PPP 

Sbjct: 1001 MSSPPPPE VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPP 1055 

Query: 55 — APPAPAPASSAPGHVAKLPQKEPVGCSKG— GGPPREDVGAPLVTPSLLQMVRLRS 108 

+ PP PAP SS P V P PV PP V +P P + 

Sbjct: 1056 PVKSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP— PPPIKSPPPPAP 1113 

Query: 109 VGAPGGAPT- - PALGPS AP 125 

V +P AP P+L P AP 
Sbjct: 1114 VSSPPPAPVKPPSLPPPAP 1132 

Score = 110 (16.5 bits), Expect - 8.0e-03, P = 8.0e-03 
Identities = 41/121 (33%), Positives - 49/121 (40%) 

Query: 5 PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP- -DPPP 54 

PPP S V SP P PS P V SP A SS ++ PP +P PPP 

Sbjct: 1060 PPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPP 1119 

Query: 55 AP PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

AP P PAP SS P V P K+ + PP E P + L + 

Sbjct: 1120 APVKPPSLPPPAPVSSPPPVVTPAPPKKE EQSLPPPAESQPPPSFNDI ILPPIMANK 1176 

Query: 109 VGAP 112 
+ P 

Sbjct: 1177 YASP 1180 

Score - 108 (16.2 bits), Expect » 1.3e-02, P = 1.3e-02 
Identities = 46/155 (29%), Positives - 67/155 (43%) 

Query: 114 GAPTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVR-LKACS-LAASEGLSSAQPNG 171 

G PTP GP + P + A S +P+P+P + +LS + A+ P+ 
Sbjct: 408 GYPTPGGGPPSSPVPGKPAAS APMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHS 4 64 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQ ADLQRNLVAELRSISEQR 227 

PP + PP P S + S ++Q +P + Q + + + 

Sbjct: 465 PPADDYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP AP SP PPPV SV PPP S P P+ +PP 
Sbjct: 525 PP-APIGSPSPPPPV SVVSPPPPVKSPPPPAPVGSPP 560 

Pedant information for DKFZphmcf l_lc23, frame 1 



Report for DKFZphmcf l_lc23 . 1 

[LENGTH] 311 

[MW) 31534.58 

[plj 9.48 

[KWJ All Alpha 

[KW] LOW~COMPLEXITY 38.59 % 

SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxx .... xxxxxxxxxxxxxxx 

PRO ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 

SEQ PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPAL 
SEG xxxxxx xxxxxxxxxxx 
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PRD cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

SEQ GPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 

SEG xxxxx xxxxxxxxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ QS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP 

SEG xxxxx xxxxxxxxxxxxxxx 

PRD ccccccceeeecccchhhhhccccccchhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceeeccccccccc 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 



(No Prosite data available for DKFZphmcf l_lc23 . 1) 
{No Pfam data available for DKFZphmcf l_lc23 . 1 ) 
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DKFZphmcf l_lel5 



group: transmembrane protein 

DKFZphmcf l_lel5 encodes a novel 454 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound . 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 
transport into cells. 

similarity to D-XYLOSE TRANSPORTER 
membrane regions: 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E12646) 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1957 bp 

Poly A stretch at pos . 1947, polyadenylation signal at pos . 1929 



1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 
51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 
101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 
151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 
201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 
251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 
301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 
351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 
401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 
451 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 
501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 
551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 
601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 
651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 
701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 
7 51 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 
801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 
851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 
901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 
951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 
1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 
1051 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 
1101 CACCTGCCTG ACCGGAGTCC TGGGTGTGGG CCTGGGTGTG GAGATCAGCC 
1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 
1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 
1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 
1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 
1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 
1401 GCTGGGTGAT GCTGGGAGCC CCTACCTCAT TGGCCTGATC TCTGACCGCC 
1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 
1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 
1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 
1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 
1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 
1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 
1751 GCCCACCCCA CGAAGGGCCT GGGCCTAACC CCTTGGCCTG GCCCAGCTTC 
1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 
1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 
1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 



BLAST Results 



Entry E12646 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score - 3046, P - 2.2e-131, identities - 640/659 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 340 bp to 1701 bp; peptide length: 4 54 
Category: similarity to known protein 



1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS TGNPKSEEPE VPDQEGLQRI 
51 TGLSPGRSAL IVAVLCYINL LNYMDRFTVA VFISSYMVLA PVFGYLGDRY 
101 NRKYLMCGGI AFWSLVTLGS SFI PGEHFWL LLLTRGLVGV GEASYSTIAP 
151 TLIADLFVAD QRSRMLSIFY FAIPVGSGLG YIAGSKVKDM AGDWHWALRV 
201 TPGLGVVAVL LliFLVVREPP RGAVERHSDL PPLNPTSWWA DLRALARNLI 
251 FGLITCLTGV LGVGLGVEIS RRLRHSNPRA DPLVCATGLL GSAPFLFLSL 
301 ACARGSIVAT YIFIFIGETL LSMNWAIVAD ILLYVVIPTR RSTAEAFQIV 
351 LSHLLGDAGS PYHGLISDR LRRNWPPSFL SEFRALQFSL MLCAFVGALG 
401 GAAFLGTAIF IEADRRRAQL HVQGLLHEAG STDDRIWPQ RGRSTRVPVA 
451 SVLI 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4_1 gene: **C13C4.5"; Caenorhabditis elegans cosmid C13C4, 
N - 3, Score - 441, P - 5.2e-76 

TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N - 2, Score - 449, P - 8.2e-69 

TREMBL : CEF09A5 1 gene: "F09A5.1"; Caenorhabditis elegans cosmid F09A5, 
N - 3, Score -~413, P = 9.1e-60 

TREMBL :ATF6H11_18 gene: 7F6H11 . 180" ; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N = 3, Score » 193, P - 2.5e-24 

SWISSPROT:XYLT_LACBR D-XYLOSE- PROTON SYMPORT (D-XYLOSE TRANSPORTER) . , N 
- 1, Score - 180, P - 7.9e-ll 



>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length - 488 

HSPs: 



Score - 449 (67.4 bits), Expect - 8.2e-69, Sum P(2) - 8.2e-69 
Identities - 88/204 (43%), Positives - 125/204 (61%) 



Query: 


58 


SALIVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 


117 




+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 




Sbjct : 


29 


AG VLTQVQT Y YNISDS LGG LIQTVFLIS FMV FS PVCG YLGDRFN RKW IMI IGVGIWLGAV 


88 


Query : 


118 


LGSSFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGS 


177 




LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 




Sbjct : 


89 


LGSSFVPANHFWLFLVLRSFVGIGEASYSNVAPSLISDMFNGQKRSTVFMIFYFAIPVGS 


148 


Query: 


178 


GLGYIAGSKVKDMAGDWHWALRVTPGLGWAVLLLFLVVREPPRGAVER HSDLPPL 


233 




GLG+I GS V + G W W +RV + G++ ++ L L EP RGA ++ D+ 




Sbjct: 


149 


GLGFIVGSNVATLTGHWQWGIRVSAIAGLIVMIALVLFTYEPERGAADKAMGESKDVVVT 


208 


Query: 


234 


NPTSWWADLRALARNH FGLITCLTG 259 






T++ DL L + L+ C G 




Sbjct: 


209 


TNTTYLEDLVILLKTPT--LVACTWG 232 




Score 


- 267 


(40.1 bits), Expect - 8.2e-69, Sum P(2) = 8.2e-69 




Identities = 74/212 (34%), Positives - 113/212 (53%) 




Query: 


249 


LI FGLITCLTGVLGVGLGVEI SRRL RH S N P RAD PLVCATGLLG S A P FL FL S L 


300 




L FG IT G++GV G +S+ L R RA PLV G L +APFL + + 




Sbjct: 


277 


L Y FGA ITT AGGL I G V I FGSMLSKW LVAGWG P FRRLQT DRAQPLVAGGGALLAA PFLLIGM 


336 


Query: 


301 


ACARGSIVATYI FI FIGETLLSMNWAI VADILLYVVI PTRRSTAEAFQIVLSHLLGDAGS 


360 
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S+V YI IF G T + NW + D+L V+ P RRSTA ++ +++SHL GDA 
Sbjct: 337 IFGDKSLVLLYIMIFFGITFMCFNWGLNIDMLTTVIHPNRRSTAFSYFVLVSHLFGDASG 396 

Query: 361 PYLIGLISDRLRRN-- WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRR-- 416 

PYLIGLISD +R +P ++ +L + C + L + +++ + +DR+ 

Sbjct: 397 PYLIGLISDAIRHGSTYPKD QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 453 

Query: 417 RAQLHVQGLLHEA — GSTD — DRI VVPQRGRSTRV 447 

RA++ + L + STD +RI + S+R+ 
Sbjct: 4 54 RAEMGLD DLQS K PIRTSTDSLERIGIN DD VAS S RL 488 

Score = 70 (10.5 bits). Expect = 5.9e-24, Sum P{2) = 5.9e-24 
Identities = 25/89 (28%), Positives - 41/89 (46%) 

Query: 62 VAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT--LG 119 

V L +NLLNY+DR+TVA ++ + LG +L+ +S V LG 

Sbjct: 11 VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI — SFMVFSPVCGYLG 68 

Query: 120 SSFIPGEHFWLLLLTRGLVGVGEASYSTIAP 150 

F W++++ G + +G S+ P 

Sbjct: 69 DRF NRKWIMIIGVG-IWLGAVLGSSFVP 95 

Pedant information for DKFZphmcf l_lel5, frame 1 

Report for DKFZphmcf lie 15 . 1 

[LENGTH] 4 54 

(MW] 49013.35 

[pi] 7.66 

[HOMOLJ TREMBL:CEC13C4_1 gene: "C13C4.S"; Caenorhabditis elegans cosmid C13C4 2e-51 

[BLOCKS] BL01022D 
[PROSITE] MYRISTYL 11 

[PROSITE] CAMP_PHOSPHO SITE 1 

[PROSITE] CK2_PHOSPHO SITE 3 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC PHOSPHO_SITE 4 

[KW] TRANSMEMBRANE 8 

[KW] LOW_COMPLEXITY 15.42 % 

SEQ MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 

SEG xxxxxxxxxxxxxxxx 

PRD cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 
MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 

SEG 

PRD hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

S EQ SFIPGEHFWLLLLTRG LVGVGEAS YSTIAPTLI A DL FVA DQRS RMLS IFYFAIPVGSGLG 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 

MEM HMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ YIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVERHSDLPPLNPTSWWA 

SEG xxxxxxxxxxxxx 

PRD eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 

MEM MMMMMMMMM 

SEQ DLRALARNLIFGLITCLTGVLGVGLGVEISRRLRHSNPRADPLVCATGLLGSAPFLFLSL 

SEG XXXXXXXXXXXXXXXX 

PRD hhhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 
MEM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ACARGSIVATYIFIFIGETLLSMNWAIVADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 

SEG 

PRD ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchhhhhhcccccccccccc 

MEM MMMM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMM 

SEQ PYLIGLISDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAI FIEADRRRAQL 
SEG XXXXXXXXXXXXX 

PRD ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhh 
MEM MMMMMMMM MM 

SEQ HVQGLLHEAGSTDDRIWPQRGRSTRVPVASVLI 
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SEG 

PRD hhhhhhhhccccceeeeeeccccccceeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMM^^ 



Prosite for DKFZphmcf l_lel5 . 1 



PS00002 


177->181 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


340->344 


CAMP PHOSPHO SITE 


PDOC00004 


PSOOOOS 


270->273 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


. 339->342 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


368->371 


PKC~PHOSPHO~SITE 


PDOC00005 


PS0O0O5 


444->447 


PKC~PHOSPHO SITE 


PDOC00005 


psooooe 


11->15 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


342->346 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


431->435 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00008 


26->32 


MYRISTYL 


PDOC00008 


PS00008 


32->38 


MYRISTYL 


PDOC00008 


PS00008 


52->58 


MYRISTYL 


PDOC00008 


psooooe 


139->145 


MYRISTYL 


PDOC00008 


PSOOOOS 


176->182 


MYRISTYL 


PDOC00008 


psooooe 


252->258 


MYRISTYL 


PDOC00008 


PS00008 


262->268 


MYRISTYL 


PDOC00008 


PS00008 


266->272 


MYRISTYL 


PDOC00008 


PSOOOOS 


288->294 


MYRISTYL 


PDOC00008 


psooooe 


305->3U 


MYRISTYL 


PDOC00008 


PSOOOOS 


397->403 


MYRISTYL 


PDOC00008 


PS00013 


292->303 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphmcf l_le 15 . 1) 
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DKFZphmcfl_lgl3 



group: mammary carcinoma derived 

DKFZphmcfl lgl3 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0S43 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes. 

similarity to KIAA0766 

commplete cDNA, complete cds, few EST hits 

on genomic level encoded by AC005020, no splicing, genomic? 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2210 bp 

Poly A stretch at pos. 2200, polyadenylation signal at pos. 2176 

1 GAAACCTGAT CTCATAAAAC CTAGGTCACA AAGGACAGCC CTGCAAAACA 

51 GACCCTATTT GGATCAAGTG AGCCAGTTCC TGGAACCTGA ATAATGACTC 

101 CTGAATCAAG GGATACTACA GATTTGTCTC CAGGGGGTAC CCAGGAGATG 

151 GAAGGCATCG TGATAGTGAA GGTGGAGGAG GAAGATGAAG AAGACCATTT 

201 TCAAAAGGAA AGAAACAAAG TAGAGTCATC GCCACAAGTT CTCAGTCGCT 

251 CTACAACTAT GAATGAGAGA GCCTTATTGT CATCGTATTT AGTTGCATAT 

301 AGAGTGGCAA AAGAGAAAAT GGCTCACACA GCGGCTGAAA AAATTATCCT 

351 TCCAGCATGT ATGGACATGG TACGGACAAT TTTTGATGAC AAATCAGCTG 

401 ATAAACTAAG AACTATACCT CTTAGTGATA ATACAATATC TCGTCGAATC 

451 TGTACGATTG CAAAACATTT GGAAGCAATG CTTATTACAC GGCTGCAGTC 

501 CGGTATAGAC TTTGCAATCC AACTCGATGA GAGCACTGAT ATTGCAAGTT 

551 GTCCCACACT CTTGGTTTAT GTCAGATATG TGTGGCAAGA TGATTTTGTA 

601 GAGGATCTCT TATGTTGTTT AAATTTAAAT TCACATATAA CTGGATTAGA 

651 TTTATTTACT GAATTAGAAA ACTGCCTTCT TGGTCAGTAT AAATTAAACT 

701 GGAAACATTG TAAAGGAATT TCAAGTCATG GAACAGCAAA TATGACCGGA 

751 AAACACAGCA GACTTACTGA AAAATTGTTA GAAGCAACCC ACAACAATGC 

801 TGTTTGGAAT CACTGTTTTA TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 

851 TTTCACCAAG TCTGATGGAT GTATTGAAAA ATGCAGTGAA AACTGTTAAT 

901 TTTATTAAAG GAAGCTCACT GAATAGCCGA CTTCTCGAAA TATTTTGTTC 

951 AGAGATTGGA GTGAACCACA CCCACTTATT GTTTCATACA GAAGTTCGTT 

1001 GGCTTTCTCA AGGAAAAGTA TTGAGCAGAG TATATGAACT CAGGAACGAG 

1051 ATTTACATTT TTCTCGTTGA AAAGCAATCT CATTTGGCAA ATATTTTTGA 

1101 AGACGACATT TGGGTAACAA AATTGGCATA TTTAAGTGAT ATTTTTGGCA 

1151 TTCTTAATGA ATTAAGCCTG AAAATGCAGG GGAAAAACAA TGATATATTT 

1201 CAGTATCTTG AACATATTCT AGGATTCCAA AAGACGTTAT TATTGTGGCA 

1251 AGCAAGACTT AAAAGTAACC GCCCTAGCTA CTATATGTTT CCAACATTAT 

1301 TGCAACACAT CGAAGAGAAC ATTATTAATG AAGACTGCTT AAAAGAAATA 

1351 AAATTAGAGA TATTGTTGCA TCTCACTTCT TTGTCTCAAA CTTTTAATTA 

1401 TTACTTTCCG GAAGAGAAAT TTGAATCATT AAAGGAAAAT ATTTGGATGA 

1451 AAGATCCATT TGCTTTTCAA AACCCAGAAT CAATAATTGA GTTAAACTTG 

1501 GAGCCTGAAG AAGAGAATGA ATTATTGCAG CTCAGTTCAT CATTCACACT 

1551 AAAGAATTAT TATAAGATAT TAAGTTTATC AGCATTTTGG ATTAAGATTA 

1601 AAGATGACTT TCCACTGCTA AGTAGGAAGA GTATATTGCT GTTACTACCA 

1651 TTCACAACTA CATATTTGTG TGAACTAGGA TTTTCAATCT TGACACGGTT 

1701 AAAAACAAAG AAGAGAAATA GGCTCAATAG TGCACCAGAT ATGCGGGTAG 

1751 CATTATCTTC ATGTGTTCCT GACTGGAAGG AACTTATGAA CAGACAAGCA 

1801 CACCCATCAC ATTAAATACA AACTTTACAA AATTCTGTGT ATAGCCAGGT 

1851 GTGGTGGCTT ACGCCTGTAA TCCCAGCAGT GGGAGACCGA GGTGGGCAGA 

1901 TCACTTGAGT TCAAGACCAG CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 

1951 CTAAAAATAG AAACCTTAGC CAGGCGTGGT GGCACATGCC TGCAGTCCCA 

2001 GTTACTTGGG TGCCTGAGGC AGGAGAATCT CTTAAACCAG GAAGGCAGAG 

2051 ATTGCAGTGA GCTGAGATAA TCCCACTGCA TTCCAGCCTG GGCAACAGCG 

2101 TGAGACTTCA TCTCAAAAAA AAAAAATTGT ATTTGTACTT TTAAAGGGAT 

2151 TTTGCAGTAT GTTGTAGTTA AACGTTAATA AAATTATATT TGTAATTAGG 

2201 AAAAAAAAAA 



BLAST Results 



Entry AC005020 from database EMBL: 

Homo sapiens clone GS259H13; HTGS phase 1, 4 unordered pieces. 
Score - 9110, P - 0.0e+00, identities - 1822/1822 
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Medline entries 

Mo Medline entry 

Peptide information for frame 1 



ORF from 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 



1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDHFQKER NKVESSPQVL 
51 SRSTTMNERA LLSSYLVAYR VAKEKMAHTA AEKIILPACM DMVRTIFDDK 
101 SADKLRTI PL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 
151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 
201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 
251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 
301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANIFEDDIW VTKLAYLSDI 
351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 
401 TLLQHIEENI INEDCLKEIK LEILLHLTSL SQTFNYYFPE EKFESLKENI 
451 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 
501 KIKDDFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 
551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877_3 from database TREMBLNEW: 

gene: "WUGSC :H_DJ0751H13 . 2"; product: "KIAA0543 protein"; Homo sapiens 

PAC clone DJ0751H13 from 7q35-qter, complete sequence. 

Score - 86, P - 4.4e-03, identities - 46/179, positives - 78/179 

Entry MD36211_1 from database TREHBL: 

product: "Hermes transposase"; Musca domes tica Hermes transposase 

gene, complete, cds . 

Score - 105, P - 3.0e-02, identities - 101/465, positives - 202/465 



Alert BLASTP hits for DKFZphmcf l_lgl3 , frame 1 

TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N - 1, Score - 300, P 
= l.le-23 

>TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length - 607 

HSPs: 

Score - 300 (45.0 bits), Expect - l.le-23, P - l.le-23 
Identities - 120/485 (24%), Positives - 229/485 (47%) 

CMD-MVRTIFDDKSADKLRTIPLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDE; 
CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 



LLV++R V + + EDLL +NL H + G + 



KGISSDGTANMTGKHSRLTEKLLEATHNNAVWN--HC — FIHREALVSKEISPSLMDVL 261 
G+++ T M G++S L + E + WN H F+H E L S ++ + ++ 



+ WL +GK L ++ LR E+ 



Query: 


89 


Sbjct : 


124 


Query : 


148 


Sbjct: 


183 


Query: 


206 


Sbjct: 


241 


Query: 


262 


Sbjct: 


299 


Query: 


321 


Sbjct: 


359 



+ F D W+ +L DI L ELS +++ 
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Query: 381 TLLLWQARLKSNRPSYYMFPTLLQHIEE NI INEDCLKEIKLEILLHLTSLSQTFNY 436 

L L+Q ++ + FP L + ++E N +E + ++++ L + F 

SbjCt: 418 KLNLFQRHIEEKNLTD — FPALREWDELKQQNKEDEKIFDPDRYQMVI — CRLQKEFER 473 

Query: 437 YFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILSL 495 

+F + +F +K+++ + +PF F+ + I + +E L +L + + L N Y+I L 

Sbjct: 474 HFKDLRF--IKKDLELFSNPFNFKPEYAP1SVRVE LTKLQANTNLWNEYRI KDL 525 

Query: 496 SAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDMR 551 

F+ + + +p++ + + F + +CE FS LTR + L R 

SbjCt: 526 GQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALFR 585 

Query: 552 VAL S SC V P DW KELMN RQAH PS H 573 

VA + P W +L+ R+ + S+ 
SbjCt: 586 VATTEMEPGWDDLV- RERNESN 606 

Score - 290 {43.5 bits), Expect - 1.5e-22, P - 1.5e-22 
Identities = 120/485 (24%), Positives « 228/485 (47%) 

Query: 89 CMD-MVRTIFDDKSADKLRTI PLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 

CM+ ++R + + L+ + LS + +RI +1 ++L L R + ++ + LD+ 

SbjCt: 124 CMEVLLREVLPEH-VSVLQGVDLSPDITRQRILSIDRNLRNQLFNRARDFKAYSLALDDQ 182 

Query: 14 8 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
SbjCt: 183 AFVAYENYLLVFIRGVGPELEVQEDLLTIINLTHHFSVGALMSAILES--LQTAGLSLQR 240 

Query: 206 CKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNHCFIHREALVSKEISPSLMDV-LKNA 2 64 

G+++ T M G++S L + E + WN IH + E+ S DV + 
SbjCt: 241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWN— VIHYSGFLHLELLSSY-DVDVNQI 297 

Query: 265 VKTVN FIKGSSLNSRLLEIFCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 319 

+ T++ IK + + +E H + + WL +GK L ++ LR E 

SbjCt: 298 INTISEWIVLIKTRGVRRPEFQTLLTESESEHGERVNGRCLNNWLRRGKTLKLIFSLRKE 357 

Query: 320 I YIFLVEKQSHLANIFEDDIWVTKLAYLSDI FGILNELSLKMQGKNNDIFQYLEHILGFQ 379 

+ FLV + + F D W+ +L DI L ELS ++ + +HI F+ 

SbjCt: 358 MEAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFE 416 

Query: 380 KTLLLWQARLKSNRPSYYMFPTLLQHIEENI INEDCLKEIKL EILLHLTSLSQTFN 435 

L L+Q ++ + FP L + ++E ++ + + K+ ++L+F 
SbjCt: 417 VKLNLFQRHIEEKNLTD—FPALREVVDE— LKQQNKEDEKIFDPDRVQMVICRLQKEFE 472 

Query: 436 YYFPEEKFESLKENIWM-KDPFAFQNPESI IELNLEPEEENELLQLSSSFTLKNYYKILS 494 

+F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I 

SbjCt: 473 RHFKDLRF — IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRI KD 524 

Query: 4 95 LSAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDM 550 

L F+ + + +P++ + + F + +CE FS LTR + L 
SbjCt: 525 LGQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALF 584 

Query: 551 RVALSSCVPDWKELMNRQAHPSH 573 

RVA + P W +L+ R+ + S+ 
SbjCt: 585 RVATTEMEPGWDDLV- RERNESN 606 

Pedant information for DKFZphmcf l_lgl3, frame 1 



Report for DKFZphmcf l_lgl3 . 1 

[LENGTH J 573 

tMW] 66276.85 

[pi] 5.82 

[HOMOL] TREMBL:AB018309_1 gene: "KIAA0766"; product: H KIAA0766 protein"; Homo sapiens 

mRNA for KIAA0766 protein, complete cds. le-18 
[PROSITEJ MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITEJ TYR_PHOSPHO_SITE 1 

[PROSITE] PKC PHOSPHO_SITE 9 

[PROSITE] ASN~GLYCOSYLATION 2 

[KW] All~Alpha 

[KWJ LOW_COMPLEXITY 8.90 % 

SEQ MTPESRDTTDLSPGGTQEMEGIVIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 
SEG xxxxxxx 



PRD ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 
SEQ LLSSYLVAYRVAKEKMAHTAAEKIILPACMDMVRTIFDDKSADKLRTIPLSDNTISRRIC 
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SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhcccccceeeeecccchhhhhhh 

SEQ TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRYVWQDDFVEDLLCCLNLNS 

SEG 

PRD hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 

SEQ HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 

SEG 

PRD eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 

SEQ CFIHREALVSKEISPSLMDVLKNAVKTVNFIKGSSLNSRLLEI FCSEIGVNHTHLLFHTE 

SEG 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhhh 

SEQ VRWLSQGKVLSRVYELRNEIYIFLVEKQSHLANIFEDDIWVTKLAYLSDI FGILNELSLK 

SEG • 

PRD cccccccchhhhhhhhhhhhhhhhhhhhchhhhhcccceeehhhhhhhhhhhhhhhhhhh 

SEQ MQGKNNDI FQYLEHILGFQKTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIK 

SEG xxxxx 

PRD hhccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhcchhhhhh 

SEQ LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

SEG xxxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SEQ SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKK 

SEG xxx xxxxxxxxxxx 

PRD hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

SEQ RNRLNSAPDMRVALSSCVPDWKELMNRQAHPSH 

SEG 

PRD hcccccccccceeeccccccchhhhhhhccccc 



Prosite for DKFZphmcf l_lgl3 . 1 



PS00001 


216- 


■>220 


ASN G LYCOS Y LAT I ON 


PDOC00001 


PS00001 


291- 


■>295 


AS N~G LYCOS Y LAT I ON 


PDOC00001 


PS00005 


116- 


>119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


218- 


■>221 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


225- 


■>228 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


358- 


>361 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


391- 


•>394 


PKC~ PHOSPHORS I TE 


PDOC00005 


PS00005 


445- 


>448 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


485- 


>488 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


510- 


>513 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


538- 


>541 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OO6 


55 


i->59 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


79 


i->83 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


95 


i->99 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


136- 


>140 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


183- 


>187 


CK2~PHOSPHO SITE 


PDOC00006 


PS00OO6 


189- 


>193 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


256- 


>260 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


445- 


>449 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


463- 


>467 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


546- 


>550 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


364- 


>372 


TYR PHOSPHO~SITE 


PDOC00007 


PS00008 


137- 


>143 


MYRISTYL 


PDOC00008 


PS00008 


273- 


>279 


MYRISTYL 


PDOC00008 


PS00008 


289- 


>295 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphmcf l_lgl3 . 1) 
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DKFZphtes3_14g5 



group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . It 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide), but not the zinc finger motif and and nuclear localization signals of 
lyar . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 1503 bp 

Poly A stretch at pos. 1467, polyadenylation signal at pos. 1440 



1 CCCAGAGGTC CGACCTGGGA 

51 CTTCCATTGG AGTGACTGAA 

101 AAAACCTGTC TTGGATAGAG 

151 TTTTTACATG CAATGCATGT 

201 AAGCATGTGT CTGTTTGCAG 

251 CGGTAAAGAT TTCTGGGGCG 

301 GTGAAGATCA GAAGTATGGT 

351 GGCGACATCA AACAGCAGGC 

401 GAGACCCAAT GTCAGCCCCA 

451 CTTTTGACAA CGTTCCCAGG 

501 AACAGTTTAA AAGTTCATAA 

551 CTTTTCTGAA GCTTCCAACA 

601 GGCCACTCCA CCCAGTGGCA 

651 CCAGCCTCCA AAGTGAAAGA 

701 GAATAAAAGA GAAAGAAAGG 

751 AGAAAGAACT AAAGTTAGAA 

801 CCTAAGAAGC GCAAAAAGGG 

851 GGAAGTCCCT GAGGCCAATG 

901 AGCAGCGCAA GGACAGCGCC 

951 AAGAGGAAGC GGAGGCACTC 

1001 GATGAAGCTC CCAGAGCATC 

1051 CTCCTGCAAA AGGTAAATTC 

1101 AAACAGGCCC CAGACAATGA 

1151 TTTAGCTCAG TACTACACAG 

1201 AACTCCTGGT CATCTTTAAC 

1251 TTATTAAAGG ACAAAGTCAA 

1301 AATTGAATCC ATTCTGCTGA 

1351 TGTAATGAAT TCTAACAACT 

1401 AGTTAAGAAA ATATATTTTT 

1451 ATTCTGGTCC AAACTTCAAA 

1501 AAA 



GGCTGGGGCT CAGAGAGCAA TGTTTGCTGT 
TTTCTACATG ACGGCTTTTT GACAAGACTT 
AATATTTAGC CATTTACCTA AAAATGGTAT 
GGTGAATCAG TGAAGAAAAT ACAAGTGGAA 
AAACTGTGAA TGCCTTTCTT GCATTGACTG 
ATGACTATAA AAACCACGTG AAATGCATAA 
GGCAAAGGCT ATGAAGGTAA AACCCACAAA 
GTGGATTCAG AAAATTAGTG AATTAATAAA 
AAGTGAGAGA ACTTTTAGAG CAAATTAGTG 
AAAAAGGCAA AATTTCAGAA TTGGATGAAG 
TGAATCCATT CTGGACCAGG TGTGGAATAT 
GCGAACCAGT CAATAAGGAA CAGGATCAAC 
AATCCACATG CAGAAATCTC CACCAAGGTT 
CGCCGTGGAA CAGCAAGGGG AGGTGAAGAA 
AAGAACGGCA GAAGAAAAGG AAAAGAGAAA 
AACCACCAGG AAAACTCAAG GAATCAGAAG 
ACAGGAGGCT GACCTTGAGG CTGGTGGGGA 
GCTCTGCAGG GAAGAGGAGC AAGAAGAAGA 
AGTGAGGAAG AGGCACGCGT GGGCGCAGGG 
GGAAGTTGAA ACAGATTCTA AGAAGAAAAA 
CTGAGGGCGG AGAACCAGAA GACGATGAGG 
AACTGGAAGG GAACTATTAA AGCAATTCTG 
AATAACCATC AAAAAGCTAA GGAAAAAGGT 
TGACAGATGA GCATCACAGA TCCGAAGAGG 
AAGAAAATCA GCAAGAACCC TACCTTTAAG 
GCTTGTGAAA TGAACATTTG TGTATTTAAA 
CTTCTTCCTT TCACTGCTGT TTATAAAATG 
CAAATTTTGC TTTTTGAAGC TGTATTTTTA 
GGTATAACTT TTATGAGAAA AATAAAATAT 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



93259460: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 
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Peptide information for frame 3 



ORF from 144 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP_GTP_A (60-68) 



1 MVFFTCNACG ESVKKIQVEK HVSVCRNCEC LSCIDCGKDF WGDDYKNHVK 

51 CISEDQKYGG KGYEGKTHKG DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 

101 ISAFDNVPRK KAKFQNWMKN SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 

151 DQRPLHPVAN PHAEISTKVP ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 

201 REKKELKLEN HQENSRNQKP KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 

251 KKKQRKDSAS EEEARVGAGK RKRRHSEVET DSKKKKMKLP EHPEGGEPED 

301 DEAPAKGKFN WKGTIKAILK QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 

351 EEELLVIFNK KISKNPT FKL LKDKVKLVK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14g5, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N - 
1, Score - 1410, P - 2.7e-144 

SWISSPROT:YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10.8 IN 
CHROMOSOME III., N =* 1, Score - 381, P - 2.9e-35 

TREMBL:AC003058_18 gene: "F27F23 . 18"; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N » 3, Score » 139, P = 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae) , N = 1, Score - 164, P - 1.4e-ll 



>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length - 388 

HSPs: 



Score * 1410 (211.6 bits), Expect - 2.7e-144, P - 2.7e-144 
Identities = 275/388 (70%), Positives - 317/388 (81%) 



Query: 


1 


MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 


60 






MVFFTCNACGESVKKIQVEK VS CRNCECLSCI DCGKDFWGDDYK+HVKCISE QKYGG 




Sbjct: 


1 


MVFFTCNACGESVKKIQVEKQVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGG 


60 


Query: 


61 


KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRK KAKFQNWMKN 


120 




KG YE KTHKGD KQQAWIQKI+ELIK+PNVSPKVRELL+QISAFDNVP K KAKFQNWMKN 




Sbjct: 


61 


KGYEAKTHKGDAKQQAWIQKINELIKKPNVSPKVRELLQQISAFDNVPIKKAKFQNWMKN 


120 


Query: 


121 


SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEIS-TKVPASKVKDAVE 


179 




SLKVH++S+L+QVW-HFSEAS+SE ++Q Q P H A PHAE+ TKVP++K E 




Sbjct: 


121 


SLKVHSDSVLEQVWDI FSEASSSE— QDQQQPPSH-TAKPHAEMPITKVPSAKTNGTTE 


176 


Query: 


180 


QQGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVP 239 




+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE+ 




Sbjct: 


177 


EQTEAKKNKRERKEERQKNRKKEKKELKLENHQENLRGQKPKKRKKNQEAGHEAAGEDGA 236 


Query: 


240 


EANG SAGKRSKKKKQRKDSASEEEA RVGAGKRKR - RH S EV ET DS K KKKM 


287 




+ +G G+ S++ R E+ A + AGKRKR +HS E+ KKKKM 




Sbjct: 


237 


DGSGPPEKKKAQGGQASEEGADRNGGPGEDRAEGQTKTAAGKRKRPKHSGAESGYKKKKM 296 


Query: 


288 


KLPEHPEGGEPEDDEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEH 


347 




KLPE PE GE + D EAP+KGKFNWKGTIKA+LKQAPDNEI ++KKL+KKV+AQY+ V ++ 




Sbjct: 


297 


KLPEQPEEGEAKDHEAPSKGKFNWKGTIKAVLKQAPDNEISVKKLKKKVIAQYHAVMNDT 


356 


Query: 


348 


HRSEEELLVIFNKKISKNPTFKLLKDKVKLVK 379 








EEELL IFN+KIS+NPTFK+LKD+VKL+K 




Sbjct: 


357 


SHHEEELLAIFNRKISRNPTFKVLKDRVKLLK 388 





Pedant information for DKFZphtes3_14g5, frame 3 
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Report for DKFZphtes3_14g5.3 



[LENGTH] 379 

(MWJ 43634.03 

[pU 9.59 

[HOMOL] PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

[ FUNCAT] 04.99 other transcription activities [S. cerevisiae, YCR087c-a] 2e-ll 

[BLOCKS] BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 

[PROSITE] AT P_GT P_ A 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 18.73 % 



SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 

SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

SEQ KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEISTKVPASKVKDAVEQ 

SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 

SEG . .xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEHHRSEEELLVIFNK 

SEG xxxxx 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 



Prosite for DKFZphtes3_14g5. 3 
PS00017 60->68 ATP_GTP_A PDOC00017 



{No Pfam data available for DKFZphtes3_14g5. 3) 
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DKFZphtes3_14h21 



group: nucleic acid management 

DKFZphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to raus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA. helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2200 bp 

Poly A stretch at pos . 2166, polyadenylation signal at pos . 2140 



1 CAACGACGTC GGACGCGCCC CTTCTTGGAA CAATGTCCCA CCACGGAGGA 
51 GCTCCCAAGG CCTCTACGTG GGTCGTTGCT AGTCGGCGAA GCTCGACAGT 
101 GTCCCGAGCG CCAGAGAGGA GGCCGGCGGA GGAGTTGAAT CGAACAGGTC 
151 CTGAGGGATA TAGTGTCGGC AGAGGTGGTC GCTGGAGAGG CACCTCTAGG 
201 CCCCCGGAGG CCGTGGCCGC TGGTCACGAG GAACTGCCGC TGTGTTTTGC 
251 TTTGAAGAGC CACTTTGTTG GCGCGGTAAT CGGTCGTGGT GGGTCAAAAA 
301 TAAAGAATAT ACAAAGTACA ACAAACACCA CAATCCAAAT AATACAAGAA 
351 CAACCAGAAT CATTAGTCAA AATTTTTGGC AGCAAGGCAA TGCAAACGAA 
401 AGCAAAAGCA GTGATAGACA ATTTTGTTAA AAAGCTAGAA GAAAATTACA 
4 51 ATTCAGAATG CGGAATTGAT ACTGCATTCC AACCTTCTGT TGGAAAAGAT 
501 GGAAGCACAG ATAACAATGT TGTTGCAGGA GATCGGCCAT TGATAGATTG 
551 GGATCAAATT AGAGAGGAAG GTTTGAAATG GCAAAAAACA AAGTGGGCAG 
601 ATTTACCACC AATTAAGAAA AACTTTTATA AAGAGTCCAC TGCCACAAGT 
651 GCCATGTCAA AAGTAGAAGC AGATAGTTGG AGGAAAGAAA ATTTTAATAT 
701 AACGTGGGAT GACTTGAAGG ATGGGGAGAA ACGACCTATC CCCAATCCTA 
751 CCTGCACATT TGATGACGCC TTTCAATGTT ATCCTGAGGT TATGGAAAAC 
801 ATTAAAAAGG CAGGTTTTCA AAAGCCAACA CCTATTCAGT CACAGGCATG 
851 GCCCATTGTG TTGCAAGGAA TAGATCTTAT AGGAGTAGCC CAGACTGGAA 
901 CAGGAAAGAC ATTGTGTTAT TTAATGCCTG GATTTATTCA TCTGGTCCTT 
951 CAACCCAGCC TTAAAGGTCA AAGGAATAGA CCCGGCATGT TAGTTCTAAC 
1001 TCCCACTCGG GAATTAGCAC TTCAAGTAGA AGGAGAATGT TGCAAATATT 
1051 CATATAAAGG GCTTCGGAGT GTTTGTGTAT ATGGTGGTGG AAATAGAGAT 
1101 GAACAAATAG AAGAGCTTAA AAAAGGTGTA GATATCATAA TTGCAACTCC 
1151 CGGAAGATTG AATGATCTGC AAATGAGTAA CTTCGTCAAT CTGAAGAATA 
1201 TAACCTACTT GGTTTTAGAT GAAGCAGACA AGATGTTGGA CATGGGATTT 
1251 GAACCCCAGA TAATGAAGAT TTTGTTAGAT GTGCGCCCAG ATAGGCAGAC 
1301 AGTTATGACC AGTGCTACAT GGCCTCATTC AGTTCATCGC CTCGCACAAT 
1351 CTTATTTGAA AGAACCAATG ATTGTCTATG TTGGTACATT GGATCTAGTT 
1401 GCTGTAAGTT CAGTGAAGCA AAATATAATT GTAACCACCG AGGAAGAGAA 
1451 ATGGAGTCAC ATGCAAACTT TTCTACAGAG TATGTCATCC ACAGACAAAG 
1501 TCATTGTCTT CGTTTCTCGA AAAGCTGTTG CGGATCACTT ATCAAGTGAC 
1551 CTAATACTTG GAAATATATC AGTAGAGTCT CTGCATGGAG ATAGAGAACA 
1601 GAGAGATCGG GAGAAAGCAT TAGAGAACTT TAAAACAGGC AAAGTGAGAA 
1651 TACTAATTGC AACTGATCTA GCCTCTAGAG GACTTGATGT CCATGACGTT 
1701 ACACATGTCT ATAATTTTGA CTTTCCACGG AATATTGAAG AATACGTACA 
1751 CCGAATAGGG CGCACGGGAA GAGCAGGGAG GACTGGTGTT TCCATTACAA 
1801 CTTTGACTAG AAATGATTGG AGGGTTGCCT CTGAATTGAT TAATATTCTG 
1851 GAAAGAGCAA ATCAGAGTAT TCCAGAGGAG CTTGTATCAA TGGCTGAGAG 
1901 GTTTGAGGCA CATCAACGGA AAAGGGAAAT GGAAAGAAAA ATGGAAAGAC 
1951 CTCAAGGAAG GCCCAAGAAG TTTCATTAAT GTCTTCTGTA CTAGTGGGGT 
2001 AGAGAATTCA AGATTTTTTA GAAATATAGT AAGACAGAAG TATTGGACAT 
2051 GTTGGCAGTA TGAAGAGACC GGACTGATTT GACTGATTCT TAAAATAATA 
2101 GTGTTTGAAA ATATAGAATC CAGTGTTTTA TACTTTCTTT AATAAAAATA 
2151 GAAGTATTTA AACTTGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (286-294) 
D£ A D_AT P_H EL I CAS E (394-403) 



1 MSHHGGAPKA STWVVASRRS STVSRAPERR PAEELNRTGP EGYSVGRGGR 
51 WRGTSRPPEA VAAGHEELPL CFALKSHFVG AVIGRGGSKI KNIQSTTNTT 
101 IQIIQEQPES LVKIFGSKAM QTKAKAVIDN FVKKLEENYN SECGIDTAFQ 
151 PSVGKDGSTD NNVVAGDRPL IDWDQIREEG LKWQXTKWAD LPPIKKNFYK 
201 ESTATSAMSK VEADSWRKEN FNITWDDLKD GEKRPI PNPT CTFDDAFQCY 
251 PEVMENIKKA GFQKPTPIQS QAWPIVLQGI DLIGVAQTGT GKTLCYLMPG 
301 FIHLVLQPSL KGQRNRPGML VLTPTRELAL QVEGECCKYS YKGLRSVCVY 
351 GGGNRDEQIE ELKKGVDII I ATPGRLNDLQ MSNFVNLKNI TYLVLDEADK 
401 MLDMGFEPQI MKILLDVRPD RQTVMTSATW PHSVHRLAQS YLKEPMIVYV 
451 GTLDLVAVSS VKQN1IVTTE EEKWSKMQTF LQSMSSTDKV IVFVSRKAVA 
501 DHLS SDL I LG NISVESLHGD REQRDREKAL ENFKTGKVRI LIATDLASRG 
551 LDVHDVTHVY NFDFPRNIEE YVHRIGRTGR AGRTGVSITT LTRNDWRVAS 
601 ELINILERAN QSIPEELVSM AERFEAHQRK REMERKMERP QGRPKKFH 



B LAS TP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14h21, frame 3 

TREMBL:C£Y54G11A_9 gene: "Y54G11A.3"; Caenorhabditis elegana cosmid 
Y54GUA, N = 1, Score - 1008, P - l.le-101 

TREMBL:SPBP8B7_16 gene: "dbp2"; "SPBP8B7 . 16c" ; product: "p68-like 
protein."; S.pombe chromosome II pi p8B7., N » 1, Score - 971, p = 
9.1e-98 

PIR:S13757 RNA helicase DBP2 - yeast (Saccharomyces cerevisiae), N ■ 1, 
Score = 970, P = 1.2e-97 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N - 1, Score = 961, P - le-96 

PIR:A57514 RNA helicase HEL117 - rat, N * 2, Score - 888, P - 7.8e-91 



>TREMBL:CEY54G11A 9 gene: "Y54G1 1A. 3"; Caenorhabditis elegans cosmid 
Y54G11A 

Length » 504 



HSPs: 



Score = 1008 (151.2 bits). Expect = l.le-101, P - l.le-101 
Identities - 211/473 (44%), Positives - 298/473 (63%) 



Query: 174 DQIREEGLKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEK 233 

D++-++E W K PI ++ YK +S + + ++ 

Sbjct: 23 DRLKDENFSWMK PIVRDLYKIPNEQKNLSPEQLQELYTNGGVMKVYPFREEST 75 

Query: 234 RPIPNPTCTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKT 293 

IP P + F+ AF +M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 

Sbjct: 76 VKIPPPVNSFEQAFGSNASIMGEIRKNGFEKPSPIQSQMWPLLLSGQDCIGVSQTGSGKT 135 

Query: 294 LCYLMPGFIHLVLQPSL KGQRNRPGML VLTPTRELALQVEGECCKYSYKGLRSVC 34 8 

L +L+P +H+ Q + + Q+ P +LVL+PTRELA Q+EGE KYSY G +SVC 

Sbjct: 136 LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 

Query: 34 9 VYGGGNRDEQIEELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 
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+YGGG+R EQ+E + GV+I+IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYVVLDEADRMLDMGFEV 255 

Query: 409 QIMKILLDVRPDRC?TVMTSATWPHSVHRLAQSYLKEPMIvYVGTLDLVAVSSVKQNIIVT 4 68 

I +IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: 256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 469 TEEEKW-— SHMQTFLQSMSSTD-KVIVFVSRKAVADHLSSDLILGNISVESLHGDREQR 524 

+ ++ + FL + + K+I + FV K +ADHLSSD + 1+ + LHG R Q 

Sbjct: 316 PHDSRFLRVCEIVNFLTAAHGQNYKMI IFVKSKVMADHLSSDFCMKGINSQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRI LIATDLASRGLDVHDVTKVYNFDFPRNIEEYVHRIGRTGRAGRT 584 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
Sbjct: 376 DREMSLNMLRSGEVQILVATDLASRGIDVPDITHVLNYDFPMDIEEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 644 

G +++ L ND LI ILE++ Q +P++L AE++ K + R RP R 

Sbjct: 436 G EAMS FLWWN D RS N FEGLIQILEKS EQEV P DQLRRDAE K YRL KCQSG RDG P RP S FRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 



Pedant information for DKFZphtes3_14h21, frame 3 



Report for DKFZphtes3_14h21 . 3 



t LENGTH] 

(MW] 

tpl] 

[HOMOLJ 

101 

[FUNCAT] 

t FUNCAT ] 

[ FUNCAT J 

[ FUNCAT] 

[FUNCAT] 

YOR204w] 2e- 

t FUNCAT] 

f FUNCAT] 

influenzae, 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

(BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW] 

| PIRKW] 

( PIRKW] 

[ PIRKW) 

( PIRKW] 

1SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM] 

[SUPFAM] 

I SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 



648' 

72873.51 
8.84 

TREMBL :CEY54G11A_9 gene: 



'Y54G11A.3"; Caenorhabditis elegans cosmid Y54G11A le- 



70 



04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-97 

30.10 nuclear organization [S. cerevisiae, YNL112w] 2e-97 
04.05.03 mrna processing (splicing) [S. cerevisiae, YPL119C] 4e-72 

30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 2e-70 

05.04 translation (initiation, elongation and termination) [S. cerevisiae. 



06.10 assembly of protein complexes 
1 genome replication, transcription 
HI0892] 2e-49 

j mrna translation and ribosome biogenesis 



[S. cerevisiae, YBR237w] le-61 
recombination and repair [H. 



[ H . influenzae, HI0231 RNA] le-48 



04.99 other transcription activities [S. cerevisiae, YDL160c] 9e-45 
04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 3e-44 
09.01 biogenesis of cell wall (S. cerevisiae, YJL033w] 2e-36 

98 classification not yet clear-cut (S. cerevisiae, YOR046c] 7e-32 
30.16 mitochondrial organization [S. cerevisiae, YDR194c] 2e-28 

99 unclassified proteins [S. cerevisiae, YGL064c] 5e-10 
11.10 cell death [S. cerevisiae, YMR190c) 2e-08 

03.19 recombination and dna repair [S. cerevisiae, YMR190c] 2e-08 

r general function prediction [M. jannaschii, MJ1401] le-07 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

nucleus 4e-96 

RNA binding 3e-87 

DEAD box 5e-50 

transmembrane protein 4e-27 

DNA binding 3e-67 

recF recombination pathway 3e-10 

ATP 4e-96 

purine nucleotide binding 5e-50 

P-loop 4e-96 

hydrolase 9e-45 

protein biosynthesis 5e-50 

ATP binding le-61 

WW repeat homology 8e-88 

DEAD/H box helicase homology 4e-96 

unassigned DEAD/H box helicases 7e-87 

ATP-dependent RNA helicase DBP1 4e-96 

ATP-dependent RNA helicase DHH1 2e-43 

recQ protein 3e-10 

Bloom's syndrome helicase 5e-07 

translation initiation factor eIF-4A 5e-50 

recQ helicase homology 3e-10 

tobacco ATP-dependent RNA helicase DB10 8e-88 
DEAD ATP HELICASE 1 
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[PROSITE] ATP_GTP A 1 

[PFAM] Helicases conserved C-terminal domain 

[PFAM] KH domain family of RNA binding proteins 

[PFAM] DEAD and DEAH box helicases 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 8.49 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQPESLVKIFGSKAM 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVI DNFVKKLEENYNSECGI DTAFQPS VGKDGSTDNNWAGDRPLI DWDQI REEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKMADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPIPNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCYLMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDHIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNI IVTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVIVFVSRKAVADHLSSDLI LGNISVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

S EQ LI ATDLAS RGLDVHDVTHVYNFDFPRN IEEYVHRIG RTGRAGRTGVS ITTLTRN DW RVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcecccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 



Prosite for DKFZphtes3_14h21.3 

PS00017 286->294 ATP_GTP A PDOCQ0017 

PS00039 394->403 DEAD ATP HELICASE PDOC00039 



Pfam for DKFZphtes3_14h21 . 3 
HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 24 8 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM UPMLQHIDwdPWpqpPQd. -PrALILAPTRELAMQIQEEcRJcFgkHMng 

L+P ++H+ +P +++ Q+ P +L+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-S LKGQRNRPGMLVLT PTRE LALQVEGECCKYSYK-G- 343 

HMM IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleM 
+R++C+YGG N ++Q+++L++G+ +I+IATPGRL D+ +++ ++L++I++ 
Query 344 LRSVCVYGGGNRDEQIEELKKGV-DIIIATPGRLNDLQMSNFVNLKNITY 392 

HMM LVMDEADRMLDMGFI DQI Rr IMrql PMpwNRQTMMFSATMPde IqELARr 

LV+DEAD+MLDMGF++QI++I+ ++ ++RQT+M SAT+P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR— PDRQTVMTSATWPHSVHRLAQS 440 
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HMM 
Query 



FMRNPIRInld.MdElTtnEnlkQwYiyVetEMWKfdcLcrLle* 
++++P + ++ D +++ +KQ +1+ E++K + ++++ 
441 YLKEPMIVYVGTLDLVAVS-SVKQNIIVTT-EEEKWSHMQTFLQ 



HMM_NAME KH domain family of RNA binding proteins 

HMM *r iilPedhMGMUGKGGsNIRqlREEYgvrlNIPdecCeDstdRIITI t 

+ + ++++G++IG+GGS ++++I I++E+ + + + I 
Query 71 " CFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQ-P ESLVKIF 

HMM G* 
G 

Query 116 G 116 



HMM_NAME Helicases conserved C-terrainal domain 

HMM *EileeWLknl GlrvmYIHGdMpQeERdelMddFNnGEynVLIcTD 

+ +++ L+ + +I+V ++HGD++Q++R++++++F++G+ ++LI+TD 
Query 497 KAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRILIATD 545 

HMM VggRGIDIPdVNHVTNYDMPWNPEqYIQRIGRTgRIG* 

+++RG+D+ DV HV+N+D+P+N+E Y + + R I GRTGR+G 
Query 54 6 LASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAG 582 
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DKFZphtes3_14pl4 



group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3969 bp 

Poly A stretch at pos. 3948, polyadenylation signal at pos. 3927 

1 GAAGCCCAGG CTCTCCTTAG TTGACTGTGT GTTAATCACC CAGCAATTTC 
51 ATTACTCAAC AGCTCTCCAG AGTTGCACAT TACAGCTGGG GTAGAAATTG 

101 GGTGCTGAAG GCCAGGCAGA GCATTTGGCT GTAGGGAGGC CGATCCTCCT 

151 CGGGCCTGTT ACCGGCGGGT CTTTGTTCTT AGACCTGGGG TTCTTGGCCT 

201 CACGGATTCC AAGGAATGGA ACGTTGGGCC ATGCGTGTGA ACGAGCTCTA 

251 TGTCGATGAC CCAGACAAGG ACAGCGGTGG CAAGATCGAC GTCAGTCTGA 

301 ACATCAGTTT ACCCAATCTG CACTGCGAGT TGGTTGGGCT TGACATTCAG 

351 GATGAGATGG GCAGGCACGA AGTGGGCCAC ATCGACAACT CCATGAAGAT 

401 CCCGCTGAAC AATGGGGCAG GCTGCCGCTT CGAGGGGCAG TTCAGCATCA 

451 ACAAGGTATG GAAGCCCTGC CTCAGCCCTT TCTACCTGCT CCCCTTTCCT 

501 GCTGTCTCCC CGCTCCCTGG AAACTGGTTG TGGAGGCACT CACTCGACCT 

551 GACCCTGACA CAGCCCCCAG CAAGCGAGGG TTCGTGTCCA GCTGCCTGGC 

601 CGTTCCTGCT GAGAATCTGG ATGGGGGTCC AGGCTCCCTG GGGTTTTAAG 

651 CCCCTGATGG CTGGTTCAGG AAGGAGCTAC TCTTCTCTCC AGTGAGGGGG 

701 ACAATGATGA GAAGACCTGA GGATTTGCAG CCCCCAGCCC TGGGTTCAAG 

751 TCCCAGCTCT ACCCCTTCTT GGCCCCTACA AGTCACTTGA CCCATCTTAG 

801 GCTGAGGGTG TGATGGCGAT AATAGTATCA CGATACCACC CACTTCACAA 

851 AGTTTGTGTG GGGATTAAAT GAGCTAATGC AGATTCATTC ATTCAGAAAA 

901 ATTTTTGAAT GGCACGTTCT GTGTTCCAGG GTCGGTGATA GGCTCTGGGG 

951 CAGCGTTCCT GGGCTGGTGG GGCTCCCATT CTGGTAGAGG GAGACAGTCT 
1001 ACAAACCAGA AAGCATCAGG GATGCTAAGT GCAGTGATGA GGAATAAAGC 
1051 CAAGGGGAGT GAGATGAGGT GGGCTTGAAA GTACCTTGTC CGCTCAGAAG 
1101 GACCATTCAA GGTTCACTGT TGTTTTGTCC TCAGAACCAG GAGCTTCAGA 
1151 TCCTAAGTCA AGTGGGTGAA CGCAGTGCCC TTGGGAGGGC CGAGGCACCC 
1201 GGTGGCAGCT GGCAGGGTTT TGCTCAGCAC GTGCCGGCCT TCCTCGAAGC 
1251 TCGGTACTGT CACAGTGGAG CCTCTCAACA ACGCTGTGAG GCAGCACCAT 
1301 TTGACAGGTT AGGATGCTGG GGCCCAGAGA GGTTAAGTGT CTTGCCCGAG 
1351 GTCACACAGC TATCTGCATG TCCCACAACT CCCCTTCCCA GCCCCAGCCA 
1401 AACTGAGCCA CTGGCCACTC CTGGCTTCTC CTTGTCCCTC CTGCAGCCTC 
1451 TGCTCAGAAC GCCCTTCCTC CAGACCCTGA CACCTGAGCT GGGGTTGCAA 
1501 AGTCACTGGC CACATCCAGC CCAAAGATAA ATTTTGTTTG TCCAGTATAG 
1551 CATTTAACTG CATCAGAACC AGTATGAAAA GACCAGGAAT CCAGATTTCT 
1601 GGCTTTTAAA AGTCAGAGGC TCTCACTACA CTGGGTCCGT GTTCCCGCTA 
1651 TGACAATGAC CTGGCACCAA TGGGCAGTGT TCCCCTTTAG AGAGGGTGTG 
1701 TGCTGTCCCT TCCCACAGTC CCTGGCAGGC GGCTGGAAGG CCAGGCCTGG 
1751 TCATCTGTCA AGCAGGGTGG ACTTCTTACG TGACAGTTCA GGGCTCCCTT 
1801 AAGTGCTAAA GCAGAAGCTG CAAGGCTTTC TTAAGGTTTC GAGTGTTGCT 
1851 GGGAGAAATC TGCTGCATGT TGTGGGTTAA AGGGAGTCTC TCACCAGCCC 
1901 AGGCCCTCAG GAGGAGGAGA TACCAGGAGG CAGGGATGCT GGGGGTCGTG 
1951 GTTCACTGGG* GGCTCTCTCT GCCCATGAGC TGCCACACAG CACCTTTGCC 
2001 ATGCCCCGTA ATTTGGATTT TATGGTGGTT GTGATGGAAA GCCATTTGAG 
2051 GGTTTTGAAC AGGGAGGCAA TGTAATCAGA TTTATGCCTT AGAACTGGAC 
2101 TATCCAATAG GTTGCCACCA GCCACATAAG GCTATTTAAA TTAATTCAAA 
2151 TTAAATGTAC AATTCAGTCA CTCATTCTCA TCAACCACAT TTCAAGTGCT 
2201 CAAAGCCACG TGCTGGCTAG GGGCCACAGC GTTAGACAGT GCAGAGAGAA 
2251 AGCACTTCCA TCGCTGAGGA AAGTTCTGCT GGACCGCACA CCCTTAGAAG 
2301 GATGGCTCTG GTGGCCGGGC GCGGTGGCTC AAACCTGTAA TCCCAGCACT 
2351 TTGGGAGGCC GAGGTGGGTG GATCACGAGG TCAGGAGATC GAGACCATCC 
2401 CGGCTAACAT GGTGAAACCC TGCCTCTACT AAAAATACAA AAAAAAACAA 
24 51 AATTAGCCGG GCGTGGTTGC GGGCACCTGT AGTCCCAGCT ACTCAGGAGG 
2501 CTGAGGCGGG AGAATGGCAT GAACCCGGGA GGTGGAGCTT GCAGTGAGCC 
2551 AAGATCGTAC CACTGCACTC CAGTCTGGGC GACAGAGTGA GACTCCATCT 
2601 CAAAACAAAC AAAAAAAGGA TGGGGCTGGG CTGGAGAGGG TGGCAGGCAG 
2651 TGGTTGTGGC AGTGGAGCTG GGGAGATGTG GTCGGATTAG GGAGGTAGAA 
2701 TCAATAAGAC TCAGTGAAGA ATCGGATGTG GGGGTAAGGG CACATGTGGA 
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2751 AGCAAAGAAA CCTTTGACGT CTTTGTCTTG ACAACCGGGT GGTCCTGTTT 
2801 CTAGACATGG AAGCTTAGAA AAGCCTGGAG TCTGTGGGAA GTAGGTAGGG 
2851 CTGGGCACTG GTCATTCCAC TCTGGTTTCC TTTGGGGTTC CCATTAGGTG 
2901 TCTACAGGGA GAGGTGAAAT TGGAAGTTGG AGGTGTGGAG AGTTCAGGAG 
2951 AGGGTTCTGG ACCACAGATG TTGAGGTGGG AGTCATTAGT GAATAGATGA 
3001 TGTTGGAAGT CATGGGTCCT CAGAGTGGGG GCTCCTTAAG CCTCCAGGCC 
3051 AGCAGCATCA GCATCACCTG GGAGATTGTT AGGAATGCAG ATTCTCAGGC 
3101 CCCCCTAAGA CCCACCGACT CTGTGCTAGA ACAAGCGCCC CTCAGAGATT 
3151 CTGATGCCAC TGAAGTTTGA GGAGCATTGG TTTAAGCAAG ATTACCTACG 
3201 GAGAGGCTGT AGATCCGTGT TCTAAACCTG GGGTCCACAG ACACCCCCAA 
3251 GAAGAGCGGA TTGAATGCAA GAGATCTATG AAGTTGGATG GGGGAAAAAT 
3301 TGACATCTTT ATTTTTGCTA AACTCGATCT AAAGTTTAGC ATTTCCATCT 
3351 GCGATGAATG TAGGCCACAA ACCACAGTAG TATTAGCAGT GCCTGGGACC 
3401 TCCTCAACAA CAGAAATTGC CGGTATTTAT AGCACGTTAC AGTTGTTGCA 
3451 GATAATTTCC AGAGACTGTT TATATGCACC ACTGTTTTAA AATTACGGTG 
3501 ATTGGCCAGG TGCAGTGGCT CACACCTGTA ATCCCAGCAC TTTGGGAGGC 
3551 CAAAGTGGGT GGATCACTTG AGGAGTTCAA GACCAGCCTG GTCAACATGT 
3601 CAAAACCCTG TATCTACAAA AAAATACAAA AGTTAACCAA GCCTATGCTT 
3 651 GTAGTCACAG CTACTCGGGA GGCCGAGGTG GGAGGGTCTT CTGAGCCCAG 
3701 GGAGGTAGAG GCTTCAGTGA GCTGAGATCG CACCACCACA CTCCAGCCTG 
3751 GGTGACAGAG TGAAACCCTT AATCAATCAG TCAATAAAAA TTACAGTAAT 
3801 TATTAGACCC ACCACTAGGT CATCTTATTT GATGCATCAG TAAAGCAGCA 
3851 TATTCAAATG TGGATTTTTA AATATTTTAA TTACTATTTA AATATCTCTT 
3901 TACTTTGTAA TCCTATGCAT TTTACGCATT AAAACATTTT AAGCATTTAA 
3951 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 



1 MERWAMRVNE LYVDDPDKDS GGKIDVSLNI SLPNLHCELV GLDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRHSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14pl4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_14pl4, frame 3 



Report for DKF2phtes3_14pl4 . 3 



[LENGTH] 159 

[MW J 17778.55 

[pi] 5.74 

[FUNCAT] 99 unclassified proteins [s. cerevisiae, YAL042w] 5e-04 

[KW] Alpha_Beta 



SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVGLDIQDEMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KIPLNNGAGCRFEGQFSINKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ EGSCPAAWPFLLRIWMGVQAPWGFKPLMAGSGRSYSSLQ 
PRD ccccccchhhhhhhhhhhccccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_14pl4 . 3 J 
(No Pfam data available for DKF2phtes3_14pl4 .3) 
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DKFZphtes3_14p7 



group: testes derived 

DKFZphtes3 14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated~protein KAP3. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



weak similarity to kinesin associated protein KAP3 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2497 bp 

Poly A stretch at pos. 2424, polyadenylation signal at pos. 2400 



1 GGAATCCAAA GAAACAGTTA TGATGGGGGA CTCTATGGTG AAAATAAATG 
51 GGATTTATTT AACAAAATCA AATGCTATTT GCCACTTAAA GAGTCACCCA 
101 CTTCAGCTAA CTGATGATGG AGGCTTCAGT GAAATAAAGG AGCAAGAAAT 
151 GTTCAAAGGA ACAACATCTT TACCATCTCA TCTCAAGAAT GGAGGGGACC 
201 AGGGGAAGAG ACATGCGAGG GCCTCATCAT GCCCCAGTAG CTCAGACCTG 
251 AGCAGGCTGC AAACCAAAGC AGTCCCAAAA GCTGACCTGC AAGAAGAGGA 
301 CGCAGAAATA GAAGTAGACG AAGTCTTTTG GAATACAAGG ATTGTACCGA 
351 TTTTGCGTGA ATTAGAAAAG GAAGAAAACA TTGAAACGGT TTGTGCTGCT 
401 TGCACACAAC TTCATCATGC TTTAGAGGAA GGAAACATGC TTGGAAATAA 
451 ATTTAAGGGA AGAAGTATTC TCCTGAAGAC CCTGTGTAAA CTAGTTGATG 
501 TTGGTTCAGA CTCGCTCAGC CTTAAACTTG CAAAAATAAT TCTAGCACTT 
551 AAAGTGAGTA GAAAGAATCT TCTTAATGTC TGCAAACTTA TATTTAAAAT 
601 TAGCAGGAAT GAGAAGAATG ATTCTTTGAT TCAAAATGAC AGCATTCTGG 
651 AATCATTATT GGAGGTACTA AGAAGTGAAG ACCTGCAAAC TAACATGGAA 
701 GCTTTTTTAT ACTGTATGGG GTCTATAAAG TTCATTTCTG GAAATCTGGG 
751 ATTTCTTAAT GAAATGATCA GCAAAGGTGC TGTGGAAATA CTGATAAATT 
801 TGATAAAACA AATAAATGAG AACATCAAGA AATGTGGTAC ATTTTTGCCT 
851 AATTCGGGCC ACTTGCTAGT CCAGGTGACT GCTACATTGA GAAACTTGGT 
901 TGATTCATCA TTAGTAAGAA GTAAGTTCCT AAACATCAGT GCCCTTCCCC 
951 AGCTCTGCAC GGCAATGGAA CAGTACAAGG GTGACAAGGA CGTCTGTACC 
1001 AATATTGCCA GAATATTCAG CAAACTTACT TCTTACCGTG ACTGCTGCAC 
1051 AGCCTTGGCC AGCTATTCCA GATGTTATGC CTTATTTCTG AATCTAATTA 
1101 ACAAATACCA GAAGAAGCAG GATTTAGTCG TCCGTGTTGT TTTTATTCTT 
1151 GGCAACCTGA CGGCAAAAAA TAACCAGGCT CGTGAACAAT TTTCCAAAGA 
1201 GAAAGGGAGC ATCCAAACTC TGCTGTCATT ATTCCAGACG TTCCATCAGC 
1251 TGGATCTGCA TTCCCAGAAG CCGGTGGGCC AACGAGGCGA GCAGCACAGG 
1301 GCGCAGAGGC CGCCGTCAGA GGCAGAGGAC GTGCTCATCA AGCTGACTCG 
1351 TGTGCTGGCC AACATTGCCA TCCACCCGGG CGTGGGCCCG GTGCTGGCCG 
1401 CCAACCCGGG GATAGTGGGC CTGCTCCTGA CCACGCTGGA ATACAAGTCA 
1451 CTTGATGATT GTGAGGAGCT GGTGATCAAT GCTACAGCGA CAATCAACAA 
1501 TTTATCTTAC TACCAAGTGA AGAATTCCAT AATTCAAGAC AAAAAGCTAT 
1551 ATATTGCTGA ATTGCTCTTA AAGCTTCTTG TCAGTAACAA CATGGATGGA 
1601 ATCCTGGAGG CTGTGCGTGT TTTCGGAAAT CTCTCCCAGG ACCATGATGT 
1651 CTGCGATTTC ATTGTGCAGA ACAATGTCCA CAGGTTCATG ATGGCGCTGC 
1701 TGGATGCTCA GCATCAGGAT ATCTGCTTTT CTGCCTGTGG TGTTCTCCTC 
1751 AATCTCACTG TGGATAAAGA CAAGC3TGTC ATCTTGAAAG AAGGAGGTGG 
1801 CATTAAAAAG TTAGTGGACT GTTTAAGAGA TTTGGGTCCT ACTGATTGGC 
1851 AGCTGGCCTG CTTGGTTTGT AAAACTTTAT GGAACTTCAG TGAAAACATC 
1901 ACTAATGCTT CGTCATGTTT TGGAAATGAA GACACCAACA CACTCTTACT 
1951 CTTCCTCTCA TCATTTTTAG ATGAAGAACT AGCACTGGAT GGCAGTTTTG 
2001 ATCCAGACCT AAAAAACTAT CACAAACTCC ATTGGGAAAC AGAATTCAAA 
2051 CCTGTGGCAC AGCAGCTTCT AAACCGAATT CAGAGACATC ACACCTTCCT 
2101 GGAACCCCTG CCCATTCCCT CTTTCTAACA TGATGCAGAT TAACAGTAGA 
2151 AACGAGAACT CACGTCTCCC TCATTCTTAA GAACTGGTAA CAAACGTGAA 
2201 CATTTTTTTC AGCATTAACA AATGTGGAAA GTTTTTCAAG AACTGGTTTT 
2251 AGTGAGTAGC TGAAGTATTT TTTAAAATTA AGCATTTCTT CTTGTTAGGT 
2301 ATTATGGAAA AATGAATATA CACATTATAT TTCCTGTTGA GAGAAATGTA 
2351 AGATGAAAAT ATGTGCATTT TCAAGTAAAT GACTTTTTCT TCTATTCTCT 
2401 ATTAAACAAT TTAGTTCTAG TCTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
2451 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 
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No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 



1 MMGDSMVKIN GIYLTKSNAI CHLKSHPLQL TDDGGFSEIK EQEMFKGTTS 
51 LPSHLKNGGD QGKRHARASS CPSSSDLSRL QTKAVPKADL QEEDAEIEVD 
101 EVFWNTRIVP ILRELEKEEN IETVCAACTQ LHHALEEGNM LGNKFKGRSI 
151 LLKTLCKLVD VGSDSLSLKL AKIILALKVS RKNLLNVCKL IFKISRNEKN 
201 DSLIQNDSIL ESLLEVLRSE DLQTNMEAFL YCMGSIKFIS GNLGFLNEMI 
251 SKGAVEILIN LIKQINENIK KCGTFLPNSG HLLVQVTATL RNLVDSSLVR 
301 SKFLNISALP QLCTAMEQYK GDKDVCTNIA RIFSKLTSYR DCCTALASYS 
351 RCYALFLNLI NKYQKKQDLV VRVVFILGNL TAKNNQAREQ FSKEKGSIQT 
401 LLSLFQTFHQ LDLHSQKPVG QRGEQHRAQR PPSEAEDVLI KLTRVLANIA 
451 IHPGVGPVLA ANPGIVGLLL TTLEYKSLDD CEELVINATA TINNLSYYQV 
501 KNSI IQDKKL YIAELLLKLL VSNNMDGILE AVRVFGNLSQ DHDVCDFIVQ 
551 NNVHRFMMAL LDAQHQDICF SACGVLLNLT VDKDKRVILK EGGGIKKLVD 
601 CLRDLGPTDW QLACLVCKTL WNFSENITNA SSCFGNEDTN TLLLLLSSFL 
651 DEELALDGSF DPDLKNYHKL HWETEFKPVA QQLLNRIQRH KTFLEPLPIP 
701 SF 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_14p7, frame 2 

TREMBL : MMD3 67 1 product: "KAP3B" ; Mu3 musculus mRNA for KAP3B, 
complete cds-7 N - 2, Score - 97, P - 0.00039 

>TREMBL:MMD367_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, complete 
cds. 

Length *■ 772 



HSPs: 



Score - 97 (14.6 bits), Expect - 3.9e-04, Sum P(2> - 3.9e-04 
Identities = 45/163 (27%), Positives = 77/163 (47%) 



Query: 


442 


LTRVLANIAIHPGVGPVLAANPGIVGLLLTTLEYKSLDDCEELVINATATINNLSYYQVK 


501 




L +++ NI+ H G P VG L + S D+ EE VI T+ NL+ + 




Sbjct: 


483 


LMKMI RN I SQHDG— PTKNLFI DYVGDLAAQI SSOEEEEFVI ECLGTLANLTI PDLD 


537 


Query: 


502 


-NSIIQDKKLYIAELLLKLLVSNNMDG-ILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMA 


559 




++++ KL + L KL D +LE V + G +S D + ++ + ++ 




Sbjct: 


538 


WELVLKEYKL-VPFLKDKLKPGAAEDDLVLEVVIMIGTVSMDDSCAALLAKSGIIPALIE 


596 


Query: 


560 


LLDAQHQDICFSACGVLL NLTVDKDKR-VILKEGGGIKKLVDCLRD 604 








LL+AQ +D F C++ + + R VI+KE L+D + D 




Sbjct : 


597 


LLNAQQEDDEF-VCQI IYVFYQMVFHQATRDVIIKETQAPAYLIDLMHD 644 




Score 


- 77 


(11.6 bits). Expect - 3.9e-04, Sum P(2) - 3.9e-04 




Identities ■ 


- 42/178 (23%), Positives » 82/178 (46%) 




Query: 


169 


KLAKI I LALKVSRKNLLNVC K-L IFKISRNEKN DS LI QNDS I LESLLEVLRSEDLQTNME 


227 






K K L V ++ LL V L+ ++ + + + ++N +1+ L++ L + N E 




Sbjct: 


263 


KTFKKYQGLWKQEQLLRVALYLLLNLAEDTRTELKMRNKNIVHMLVKALDRD NFE 


318 


Query: 


228 


AFLYCMGSIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVT 


287 






+ + +K +S + N+M+ VE L+ +1 +E++ L + + 




Sbjct: 


319 


LLILVVSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDL LNITLR 


366 


Query: 


288 


AT L RN LVDSS L VRS K FLN ISA L PQLCTAM — EQ Y KG DK DVC T — N I AR I — FSKLTSYRD 


341 




L D+ L R+K + + LP+L + E YK +C +1+ F + +Y D 




Sbjct: 


367 


LLLNLSFDTGL-RNKMVQVGLLPKLTALLGNENYK-QIAMCVLYHISMDDRFKSMFAYTD 


424 



574 
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Query: 342 CCTAL 346 
C L 

Sbjct: 425 CIPQL 429 

Score - 69 (10.4 bits), Expect - 2.6e+00, Sum P(2) » 9,2e-01 
identities = 35/146 (23%), Positives - 70/146 (47%) 



Query: 


512 


IAELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCDFIVQNHVHRFMMALLDAQHQDICFS 


571 






I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+0+ 




Sbjct: 


304 


I VHML VKAL DRDN FELL I LVVS FLKKLS IFMENKN DMVEMD I VEKLVKMI PCEHEDLLNI 


363 


Query: 


572 


ACGVLLNLTVDKDKRVI LKEGGGI KKLVDCL RDLG PT DW-QLAC LVC KTLWN FSENITNA 


630 






+LLNL+ 0 R++G+KL L G++ Q+A +C L++ S + 




Sbjct: 


3 64 


TLRLLLNLSFDTGLRNKMVQVGLLPKLTALL GNENYKQIA--HC-VLYHISMD-DRF 


416 


Query: 


631 


SSCFGNEDT-NTLLLLLSSFLDEELALD 657 








S F D L+ +L DE + L+ 




Sbjct: 


417 


KSMFAYTDCIPQLMKMLFECSDERIDLE 444 




Score 


- 68 


(10.2 bits), Expect - 3.2e-03, Sum P(2) = 3.2e-03 




Identities ■ 


- 18/58 (31%), Positivas = 30/58 (51%) 




Query: 


190 


LIFKISRNEKN-DSLIQNDSILESLLEVLRSE DLQT NMEAFL YCMGS I K F I SG 241 






LI +++RN H + L+ N++ L +L VLR + +L TN+ +C S G 




Sbjct: 


155 


LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNII YI FFCFSSFSHFHG 212 


Score 


- 65 


(9.8 bits), Expect - 6.4e+00, Sum P(2) - 1.0e+00 




identities ■ 


- 26/122 (211), Positives - 53/122 (43%) 




Query: 


283 


LVQVTATLRNL VDSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTS 


338 






+ + + TL NL +D LV ++ +P L ++ + D+ + I S 




Sbjct: 


521 


VIECLGTLANLTI PDLDWELVLKEY KLVPFLKDKLKPGAAEDDLVLEVV-IMIGTVS 


576 


Query: 


339 


YRDCCTALASYSRCYALFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSI 


396 






D C AL + S + L+N Q+ + V +++++ + + R+ KE + 




Sbjct: 


577 


MDDSCAALLAKSG 1 1 PAL I ELLNAQQEDDEFVCQI I YV FYQMVF-HQATRDV 1 1 KETQAP 


635 


Query: 


399 


QTLLSL 404 








L+ L 




Sbjct: 


636 


AYLIDL 641 




Score 


» 65 i 


(9.8 bits). Expect - 6.4e+00, Sum P(2) - 1.0e+00 




Identities * 


■ 44/177 (24%), Positives - 79/177 (44%) 




Query: 


481 


CE-ELVINATATIN-NLSYYQ-VKNSI IQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 


537 






CE E ++N T + NLS+ ++N ++Q ++ LLL+N IA+V+ 




Sbjct: 


3S5 


CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGNENYKQI --AMCVLYH 


409 


Query: 


538 


LSQDHDVCD-FIVQNNVHRFMMALLDAQHQDICFSACGVLLKLTVDKDKRVI LKEGGGI K 


596 






+SD F +++ML+ +1 +NL +K ++ EG G+K 




Sbjct: 


410 


ISMDDRFKSMFAYTDCIPQLMKMLFECSDERIDLELISFCINLAANKRNVQLICEGNGLK 


469 


Query: 


597 


KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEELAL 


656 






L+ R L D L+ K + N S++ + F + L +SS +EE + 




Sbjct: 


470 


MLMK--RALKLKD PLLMKMIRNISQHDGPTKNLF-IDYVGDLAAQISSDEEEEFVI 


522 


Query: 


657 


D 657 
+ 




Sbjct: 


523 


E 523 





Score - 61 (9.2 bits). Expect - 1.6e-02, Sum P(2) » 1.6e-02 
Identities - 20/66 (30%), Positives • 34/66 (51%) 



Query: 304 LNISALPQLCTAM-EQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYALFLNLINK 362 

LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N+I + 

Sbjct: 171 LNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 229 

Query: 363 YQKKQDL 369 
K+ +L 

Sbjct: 230 ELKRHEL 236 



Pedant information for DKFZphtes3_14p7, frame 2 



Report for DKFZphtes3_14p7 . 2 



(LENGTH) 708 

[MW] 79266.35 

[pi) 6.57 



575 



WO 01/12659 



PCT/IB00/01496 



t FUNCAT) 30.25 vacuolar and lysosomal organization fS. cerevisiae, YEL013w] 3e-04 

[FUNCAT) 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

3e-04 

[ FUNCAT ) 09.25 vacuolar and lysosomal biogenesis IS. cerevisiae, YEL013w] 3e-04 

[BLOCKS | BL00923F Aspartate and glutamate racemases proteins 

[BLOCKS} BL00288B Tissue inhibitors of metalloproteinases proteins 

[PROSITE] MYRISTYL 9 

[PROSITE] AMIDATION 1 

[ PROSITE) CK2 PHOSPHO_SITE 12 

[PROSITE] PKC~>HOSPHO_SITE 7 

[PROSITE] AS N~GL YCOS YLAT I ON 11 

[KW] Alpha_Beta 

[KWJ. LOW_COMPLEXITY 7.49 % 

SEQ ESKETVMMGDSMVKINGI YLTKSNAICHLKSHPLQLTDDGGFSEIKEQEMFKGTTSLPSH 

SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMT-GNKFKGRSILLKTLCKLVDVGSDSLSLKLAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVTATLRNLV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALASYSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQK PVGQRG EQH RAQR P PS EAE DVL I KLTRVLAN I A I H PGVG P V LAAN PG I VGLLLTTLE 

SEG 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ IKKLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPIPSF 

SEG xxx 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 



Prosite for DKFZphtes3_14p7 . 2 



PS00001 


206 


->210 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


212 


->216 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


311 


->315 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


385 


->389 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


493 


->497 


ASN GLYCOSYLAT10N 


PDOC00001 


PS00001 


500 


->504 


A SN~G LYCOS YLAT ION 


PDOC00001 


PS00001 


543 


->547 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


584 


->588 


ASN~GLYCOSYLATION 


PDOC00001 


PS00001 


628 


->632 


AS N~G LYCOS YLAT I ON 


PDOC00001 


PS00001 


632 


->636 


asn~glycosylation 


PDOC00001 


PS00001 


635 


->639 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00005 


173 


->176 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


186 


->189 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


241 


->244 


PKC~PHOSPHO~SITE 


PDOC00005 
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PS00005 


295- 


>298 


PKC PHOSPKO 


SITE 






344- 


>347 


PKC PHOSPH0 


"site 


PDOC00005 


PS00005 


387- 


>390 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


421- 


>424 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


79 


i->83 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


201- 


>205 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


214- 


>218 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


218- 


>222 


CK2 PHOSPHO 


site 


PDOC00006 


PS0Q006 


230- 


>234 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


320- 


>324 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


344- 


>348 


CK2~PHOSPHO _ 


"site 


PDOC00006 


PS0000S 


439- 


>443 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


477- 


>481 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


654- 


>658 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


698- 


>702 


CK2~PHOSPH0" 


"site 


PDOC00006 


PS00008 


17 


->23 


MYRISTYL 




PDOC00008 


PS00008 


64 


->70 


MYRISTYL 




PDOC00008 


PS00008 


144- 


>150 


MYRISTYL 




PDOC00008 


PS00008 


384- 


>390 


MYRISTYL 




PDOC00008 


PS00008 


402- 


>408 


MYRISTYL 




PDOC00008 


PS00008 


473- 


>479 


MYRISTYL 




PDOC00008 


PS0O008 


533- 


>539 


MYRISTYL 




PDOC00008 


PS0O008 


580- 


>586 


MYRISTYL 




PDOC00008 


PS00008 


641- 


>647 


MYRISTYL 




PDOC00008 


PS00009 


67 


->71 


AMI DAT ION 




PDOC00009 



(No Pfam data available for DKFZphte33_14p7 .2) 
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DKFZphtes3_15al3 



group: testes derived 

DKFZphtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes . 

similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-specif ic protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos . 1766, no polyadenylation signal found 

1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG TGCAGCCTCG TGCAGCTCTT 
51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA ATGTTGAATT AAAGAAAATA 

101 CTTTATCAGA AGAAGATGGC CACTGCCCAG TTGCAGAGGA CTCCCATGAG 

151 TGCACTGGTA TTTCCCAATA AGATATCAAC TGAACACCAG TCTTTGGTGT 

201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT CCTGTATCAC GTATTTGAGG 

251 GGAATATTCC CAGAATGCGC TTATGGAACA AGATATCTAG ATGATCTTTG 

301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG CCCAGGATCT ACACAGTTAG 

351 TGAAATGGAT GCTAGGATGT TATGATGCTT TACAGAAAAA ATATGTATAC 

401 ACAAACCCAG AAGATCCTCA GACAATTTCA GAATGTTACC AATTCAAATT 

451 CAAATACACC AATAATGGAC CACTCATGGA CTTCATAAGT AAAAACCAAA 

501 GCAACGAATC TAGCATGTTG TCTACTGACA CCAAGAAAGC AAGCATTCTC 

551 CTCATTCGCA AGATTTATAT CCTAATGCAA AATCTGGGGC CTTTACCTAA 

601 TGATGTTTGT TTGACCATGA AACTTTTTTA CTATGATGAA GTTACACCCC 

651 CAGATTACCA GCCTCCCGGT TTTAAGGATG GTGATTGTGA AGGAGTTATA 

701 TTTGAAGGGG AACCTATGTA TTTAAATGTG GGAGAAGTCT CAACACCTTT 

751 TCACATCTTC AAAGTAAAAG TGACCACTGA GAGAGAACGA ATGGAAAATA 

801 TTGACTCAAC TATACTATCA CCAAAACAAA TAAAAACACC ATTTCAAAAA 

851 ATCCTGAGGG ACAAAGATGT AGAAGATGAA CAGGAGCATT ATACAAGTGA 

901 TGATTTGGAC ATTGAAACTA AAATGGAAGA ACAGGAAAAA AACCCTGCAT 

951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT GTGAGGAAGA TGAAATTATG 
1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT TCTCATTCTC AGGTTGAGCA 
1051 GTTAGTCAAT AAAACATCTG AACTTGATAT GTCTGAAAGC AAAACAAGAA 
1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA ATGGAAATCA ACCAGTAAAA 
1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA CATGAATCTG GGAGAATAGT 
1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA GTCAGTGCCA AAAAGGAGAA 
1251 AGTTTAGTGA ACCAAAGGAA CAT AT AT AAA AATTATTTTT GTTCTGCAGG 
1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG AAGGACCCTA TATTATATTT 
1351 CCCTAACTCT GAAGATGTAT ATGTAGTTTA AAGCAGTTTG TACACTAAAA 
1401 CTAAGTTTTT GGCTGACTGT CATATTGTGG TCCTTAATCT TGAGATAAAT 
1451 CCAATAGAAC TTTTGAATAA AAGCAAAAGT ACAAATGTCA TAATTGATTC 
1501 GGTAATAAGT AAAATTTCAA AATTGATTTT GTTCATTACC TACTTAATAT 
1551 TTCCTTTAAA TATATACTAA CTGTTAAGGC CCTCTAATGC CATTTTTCTA 
1601 AACAGTAATG TTTACTTTGG TATTAAAATT TGGTATGGAT TCACTTTTTA 
1651 CTTATGTTAA AATTATACCA TTTAACTGGC TCTTTTGTCA TTGTGCTGTT 
1701 ATTAAAACAA TGTTCTTCAA TATTTTGACA TAATGTATTA ACATTTTAAT 
1751 ATATAATGTA CAATTTAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 
1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACAAAAAAA AAAAAAGG 

BLAST Results 

NO BLAST result 

Medline entries 

No Medline entry 
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Peptide information for frame 2 



ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 



1 MATAQLQRTP MSALVFPNKI STEHQS LVLV KRLLAVSVSC ITYLRGIFPE 

51 CAYGTRYLDD LCVKILREDK NCPGSTQLVK WMLGCYDALQ KKYVYTNPED 

101 PQTISECYQF KFKYTNNGPL MDFISKNQSN ESSMLSTDTK KASILLIRKI 

151 YILMQNLGPL PNDVCLTMKL FYYDEVTPPD YQPPGFKDGD CEGVIFEGEP 

201 MYLNVGEVST PFHI FKVKVT TERERMENID STILSPKQIK TPFQKILROK 

251 DVEDEQEHYT SDDLDIETKM EEQEKNPASS ELEEPSLVCE EDEIMRSKES 

301 PDLSISHSQV EQLVNKTSEL DMSESKTRSG KVFQNKMANG NQPVKSSKEN 

351 RKRSQHESGR IVLHHFDSSS QESVPKRRKF SEPKEHI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15al3, frame 2 

TREMBL : ATAC2 1 30_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N - 1, Score - 
274, P = 5.7e-22 

TREMBL :SC9877 9 gene: "hopl"; S. cerevisiae chromosome IX cosmid 9877., 
N - 2, Score = 126, P = 7 . le-09 

PIR:A34691 meiosis-specif ic protein HOP1 - yeast (Saccharomyces 
cerevisiae), N « 2, Score - 126, P - 7.8e-08 



>TREMBL:ATAC2 130^3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis~thaliana chromosome 1, complete sequence. 
Length - 562 

HSPs: 

Score = 274 (41.1 bits), Expect » 5.7e-22, P - 5.7e-22 
Identities - 8.4/290 (28%), Positives - 145/290 (50%) 

Query: 22 TEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDDLCVKILREDKNCPGSTQLVKW 81 

TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + S +L+ W 

Sbjct: 11 TEQDSLLLTRNLLRI AIFNISYIRGLFPEKYFNDKSVPALDMKIKKLMPMDAESRRLIDW 70 

Query: 82 M-LGCYDALQKKYVYT NPEDPQTISECYQFKFKYTNNGP-- LMDFISK— NQSN 130 

M G YDALQ+KY+ T D I E Y F F Y+++ +M I++ N+ N 

Sbjct: 71 MEKGVYDALQRKYLKTLMFSICETVDGPMIEE-YSFS FSYSDSDSQDVMMNINRTGNKKN 129 

Query: 131 ESSMLST DTKKASILLrRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPDYQPP 184 

ST + ++ ++R + LM+ L +P++ + MKL YYD+VTPPDY+PP 

Sbjct: 130 GG I FN ST ADI T PNQMRS S AC KMVRT LVQLMRTLDKM P D ERT I VMKLLY Y DDVT P P DY E P P 189 

Query: 185 GFKD--GDCEGVI FEGEPMY LNVGE VST PFHI FKVKVTT ERERMENIDSTILS 235 

F+ D ++ P+ + +G V++ + +KV + E + M++ D + 

Sbjct: 190 FFRGCTEDEAQYVWTKNPLRHEIGNVNSKHLVLTLKVKSVLDPCEDENDDMQD-DGKSIG 248 

Query: 236 PKQIKTPFQKILRDKDVEDEQEHY TSDDLDIETKMEEQEKN PASSE 281 

p + Q D ++ QE+ DD D E ++ ++PA +E 

Sbjct: 249 PDSVHDD-QPSDSDSEISQTQENQFIVAPVEKQDDDDGEVDEDDNTQDPAENE 300 



Pedant information for DKF2phtes3_15al3, frame 2 



Report for DKFZphtes3_15al3.2 



[LENGTH] 387 

[MW] 44417.64 

[pi J 5.57 

[HOMOL] TREMBL: AT AC2130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 9e-23 

[FUNCAT1 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072w) 7e-ll 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YIL072w] 7e-ll 

[PIRKW] nucleus 2e-09 

[PIRKW] zinc finger 2e-09 
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[PIRKWJ DNA binding 2e-09 

[PROSITE] MYRISTYL 1 

[PROSITE] CAMP_PHOSPHO_SIT£ 3 

[ PROSITE} CK2 PHOSPHO SITE 12 

[PROSITE] PKC~PHOSPHO~SITE 7 

[ PROSITE} ASN~GLYCOSYLATION 3 

(KW] Alpha_Beta 



SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVI FEGEPMYLNVGEVSTPFHIFKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDIETKMEEQEKNPASSELEEPSLVCEEDEIMRSKES 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ PDLSISHSQVEQLVNKTSELDMSESKTRSGKVFQNKMANGNQPVKSSKENRKRSQHESGR 

PRD ccccccchhhhhhhhhhcccccccccccccceeeecccccccccchhhhhhhhhhcccce 

SEQ IVLHHFDSSSQESVPKRRKFSEPKEHI 

PRD eeeeecccccccccccccccccccccc 



Prosite for DKFZphtea3_15al3.2 



PS00001 


127- 


•>131 


ASH GLYCOSYLATION 


PDOC00001 


PS00001 


130- 


->134 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


315- 


■>319 


ASN GLYCOSYLATION 


PDOC00001 


P500004 


140- 


->144 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


351- 


■>355 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


378- 


•>382 


CAMP PHOSPHO~SITE 


PDOC00004 


PS00005 


139- 


■>142 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


167- 


■>170 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


221- 


■>224 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


235- 


■>238 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


329- 


■>332 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


346- 


>349 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


358- 


■>361 


PKC PHOSPHORS ITE 


PDOC00005 


PS00006 


96- 


MOO 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


103- 


■>107 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOO06 


177- 


•>181 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


221- 


>225 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


260- 


>264 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


268- 


>272 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


280- 


>284 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


308- 


>312 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


318- 


■>322 


CK2"*PHOSPHO SITE 


PDOC00006 


PS00006 


346- 


■>350 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00O06 


354- 


■>358 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


369- 


>373 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00008 


84 


->90 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_15al3.2) 
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DKFZphtes3_15c24 



PCT/IB00/01496 



group: metabolism 

DKFZphtes3_15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the~ reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), catalyzes the 
oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 



strong similarity to C.elegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus : unknown 

Insert length: 1956 bp 

Poly A stretch at pos. 1929, polyadenylation signal at pos . 1903 



1 CGAAGGCGGC GGCGAAGGCC CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
51 AGCCATGGCG GAGTCTGTGG AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
101 AGCGGGAACT TGCCCAGGAG AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
151 GGAGGGGGCG GCCGGGTCCG CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
201 TTCGAATCCC TACAGCCGCT TGATGGCATT GAAACGAATG GGAATTGTAA 
251 GCGACTATGA GAAAATCCGT ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
301 GGAGTAGGTA GTGTGACTGC TGAAATGCTG ACAAGATGTG GCATTGGTAA 
351 GTTGCTACTC TTTGATTATG ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
401 TTTTCTTCCA ACCTCATCAA GCAGGATTAA GTAAAGTTCA AGCAGCAGAA 
451 CATACTCTGA GGAACATTAA TCCTGATGTT CTTTTTGAAG TACACAACTA 
501 TAATATAACC ACAGTGGAAA ACTTTCAACA TTTCATGGAT AGAATAAGTA 
551 ATGGTGGGTT AGAAGAAGGA AAACCTGTTG ATCTAGTTCT TAGCTGTGTG 
G01 GACAATTTTG AAGCTCGAAT GACAATAAAT ACAGCTTGTA ATGAACTTGG 
651 ACAAACATGG ATGGAATCTG GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
701 TACAGCTTAT AATTCCTGGA GAATCTGCTT GTTTTGCGTG TGCTCCACCA 
751 CTTGTAGTTG CTGCAAATAT TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
801 TTGTGCAGCC AGTCTTCCTA CCACTATGGG TGTGGTTGCT GGGATCTTAG 
851 TACAAAACGT GTTAAAGTTT CTGTTAAATT TTGGTACTGT TAGTTTTTAC 
901 CTTGGATACA ATGCAATGCA GGATTTTTTT CCTACTATGT CCATGAAGCC 
951 AAATCCTCAG TGTGATGACA GAAATTGCAG GAAGCAGCAG GAGGAATATA 
1001 AGAAAAAGGT AGCAGCACTG CCTAAACAAG AGGTTATACA AGAAGAGGAA 
1051 GAGATAATCC ATGAAGATAA TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
1101 TTCAGAAGAG GAACTGAAAA ATTTTTCAGG TCCAGTTCCA GACTTACCTG 
1151 AAGGAATTAC AGTGGCATAC ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
1201 ACTGAGTTAA CAGTGGAAGA TTCTGGTGAA AGCTTGGAAG ACCTCATGGC 
1251 CAAAATGAAG AATATGTAGA TAATGGACTG GGATATATTG TATTTCTCAT 
1301 GTTAAAGCCT CTTCCCTTGA AATTAAAAAA AAATTTTAAC TGATAAAACT 
1351 TAGGGCAACA TTAATTAATG TATATT^TTA CCTGAATTGT TATACTTTTT 
1401 GAAAATCCTG TGACTTGCCT GTTTCTCCCC GCTCCAACGA AATCATTAAC 
14 51 TCTCCTAAAA TGTGTTTCAT TCTAGTAAGA AAACCTCAAA GGATATTGTA 
1501 GGATATAAAT CTTACTTGAA AACATAGCTG TTGAAATGTT TTGGCCTTTT 
1551 GGAGTGGGGG AAGGACAAAT CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
1601 TCCCTTGTGT CTGTTGCATG AGGACATGGA CAATAAAGTA GTATATGATC 
1651 CTCAGATACA GGGAGAAGGA CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
1701 CAAGCATCTG CTCATTATGT TTGGAATTGC TTTCTATAAG AAAATTGCCC 
1751 ACTACTACTA ACTTGATCAA CAATGAATTC AAAATAGTTA ACCTATGAAA 
1801 TAACATCCTC TCAAATGTTT GCTGATGAAG TACAAGTTGA AATGTAGTTA 
1851 TTGGAAAAGT CTGTAACCTG TGGATCATAT ATATTCAAAG TGAGACAAAG 
1901 GCAAATAAAA AGCAGCTATT TTCATGAATA GACAAAAAAA AAAAAAAAAA 
1951 AAAAAG 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein. 
Classification: Metabolism 

Prosite motifs: D_2_HYDROXYACID_DH_l (76-105) 



1 MAESVERLQQ RVQELERELA QERSLQVPRS GDGGGGRVRI EKMSSEVVDS 

51 NPYSRLMALK RMGIVSDYEK IRTFAVAIVG VGGVGSVTAE MLTRCGIGKL 

101 LLFDYDKVEL ANMNRLFFQP HQAGLSKVQA AEHTLRNINP DVLFEVHNYN 

151 ITTVENFQHF MDRISNGGLE EGKPVDLVLS CVDNFEARMT INTACtJELGQ 

201 TWMESGVSEN AVSGHIQLII PGESACFACA PPLVVAANID EKTLKREGVC 

251 AASLPTTMGV VAGILVQNVL KFLLNFGTVS FYLGYNAMQD FFPTMSMKPN 

301 PQCDDRNCRK QQEEYKKKVA ALPKQEVIQE EEEIIKEDNE WGIELVSEVS 

351 EEELKNFSGP VPDLPEGITV AYTI PKKQED SVTELTVEDS GESLEDLMAK 

401 MKNM 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15c24, frame 1 

TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid 
T03F1., N » 1, Score = 1204, P = 1.9e-122 

TREMBL: ATAC98_3 gene: "YUP8H12 . 3"; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N *= 1, Score - 733, P - 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) homolog - Archaeoglobus 
fulgidus, N = 1, Score = 218, P - 1.8e-17 

TREMBL:AF022796_4 gene: "moeB"; product: "MoeB" ; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluster, complete 
sequence., N = 1, Score = 220, P - 3.7e-16 

4 

>TREMBL:CEOT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1 . 
Length = 419 



HSPs: 



Score - 1204 (180.6 bits), Expect - 1.96-122, P = 1.9e-122 
Identities - 241/367 (65%), Positives - 293/367 (79%) 



Query : 


37 


Sbjct : 


48 


Query: 


97 


Sbjct: 


108 


Query: 


157 


Sbjct: 


168 


Query: 


217 


Sbjct: 


227 


Query: 


277 


Sbjct: 


287 


Query: 


335 


Sbjct: 


347 


Query: 


395 



R +IEK+S+EVVDSNPYSRLMAL+RMGIV++YE+IR VA+VGVGGVGSV AEMLTRCG 



IGKL+LFDYDKVE+ANMNRLF+QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 



F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 



Q I PG++ACFAC PPLVVA+ IDE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 



G VS Y+GYNA+ DFFP S+KPNP CDD +C ++Q+EY++KVA P EV + EEE + 



+HEDNEWGIELV+E SE 



395 EDLMAKMKN 403 



G+ AY P K+ D+ TEL+ 
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D M +K + 
Sbjct: 400 HDFMKSIKD 408 

Pedant information for DKFZphtes3_15c24, frame 1 

Report for OKFZphtes3_15c24 . 1 

[LENGTH] 404 

[MW] 44863.36 

(pi) 4.79 

(HOMOLJ TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosraid T03F1 . le-1 

( FUNCAT ] h cofactor metabolism [H. influenzae, HI1449] 2e-08 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) (S. cerevisiae, YDR390c UBA2 - El-like] 

4e-07 

[ FUNCAT ) 04.05.05 mrna processing (5' -end, 3' -end processing and mrna degradation) [S 

cerevisiae, YDR390c UBA2 - El-like] 4e-07 

[FUNCAT] 06.13.01 cytoplasmic degradation [S. cerevisiae, YDR390c UBA2 - El-like] 

4e-07 

[ FUNCAT) 30.10 nuclear organization [S. cerevisiae, YDR390c UBA2 - El-like] 4e-07 

[ FUNCAT) 11.01 stress response [S. cerevisiae, YKL210w UBA1 - El-like] 2e-06 

[FUNCAT) 30.03 organization of cytoplasm [S. cerevisiae, YKL210w UBA1 - El-like) 

2e-06 

[BLOCKS ) BL01042A Homoserine dehydrogenase proteins 

[PIRKW] thiamine pyrophosphate le-07 

[PIRKW] molybdenum 5e-07 

[ PIRKW) molybdopterin biosynthesis 5e-07 

(SUPFAMl molybdopterin biosynthesis protein moeB 2e-12 

[PROSITE] D_2_HYDROXYACID_DH_l 1 

[KW1 TRANSMEMBRANE 1 

[KWJ LOW_COMPLEXITY 0.66 % 

SEQ MAES VE RLQQRVQELERELAQERS LQV PRS G DGGGGRV RI EKMSSEWDSNPYSRLMALK 

SEG 

PRO ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 
MEM 



SEQ RMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCGIGKLLLFDYDKVELANMNRLFFQP 

SEG xxxxxxxxx 

PRD cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ HQAGLSKVQAAEHTLRNINPOVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

SEG 

PRD ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 



MEM 



SEQ CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLIIPGESACFACAPPLVVAANID 

SEG 

PRD cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 

MEM 

SEQ EKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 

MEM 

SEQ PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

SEG xxxxxxxxxxxxxxx . . . xxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 

MEM 

SEQ VPDLPEGITVAYTIPKKQEDSVTELTVEDSGESLEDLMAKMKNM 

SEG 

PRD ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 

MEM 



Prosite for DKFZphtes3_15c24 . 1 
PS00065 76-M05 D_2_HYDR0XYACID_DH_1 PDOC00063 

(No Pfam data available for DKFZphtes3_15c24 . 1) 
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DKFZphtes3_15c6 



group: transmembrane protein 

DKFZphtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1283 bp 

Poly A stretch at pos. 1264, no polyadenylation signal found 



1 GAGACACTGA GCCCCGAGAC AGTGAGTGGT GGCCTCACTG CTCTGCCCGG 
51 CACCCTGTCA CCTCCACTTT GCCTTGTTGG AAGTGACCCA GCCCCCTCCC 
101 CTTCCATTCT CCCACCTGTT CCCCAGGACT CACCCCAGCC CCTGCCTGCC 
151 CCTGAGGAAG AAGAGGCACT CACCACTGAG GACTTTGAGT TGCTGGATCA 
201 GGGGGAGCTG GAGCAGCTGA ATGCAGAGCT GGGCTTGGAG CCAGAGACAC 
251 CGCCAAAACC CCCTGATGCT CCACCCCTGG GGCCCGACAT CCATTCTCTG 
301 GTACAGTCAG ACCAAGAAGC TCAGGCCGTG GCAGAGCCAT GAGCCAGCCG 
351 TTGAGGAAGG AGCTGCAGGC ACAGTAGGGC TTCCTGGCTA GGAGTGTTGC 
401 TGTTTCCTCC TTTGCCTACC ACTCTGGGGT GGGGCAGTGT GTGGGGAAGC 
4 51 TGGCTGTCGG ATGGTAGCTA TTCCACCCTC TGCCTGCCTG CCTGCCTGCT 
501 GTCCTGGGCA TGGTGCAGTA CCTGTGCCTA GGATTGGTTT TAAATTTGTA 
551 AATAATTTTC CATTTGGGTT AGTGGATGTG AACAGGGCTA GGGAAGTCCT 
601 TCCCACAGCC TGCGCTTGCC TCCCTGCCTC ATCTCTATTC TCATTCCACT 
651 ATGCCCCAAG CCCTGGTGGT CTGGCCCTTT CTTTTTCCTC CTATCCTCAG 
701 GGACCTGTGC TGCTCTGCCC TCATGTCCCA CTTGGTTGTT TAGTTGAGGC 
751 ACTTTATAAT TTTTCTCTTG TCTTGTGTTC CTTTCTGCTT TATTTCCCTG 
801 CTGTGTCCTG TCCTTAGCAG CTCAACCCCA TCCTTTGCCA GCTCCTCCTA 
851 TCCCGTGGGC ACTGGCCAAG CTTTAGGGAG GCTCCTGGTC TGGGAAGTAA 
901 AGAGTAAACC TGGGGCAGTG GGTCAGGCCA GTAGTTACAC TCTTAGGTCA 
951 CTGTAGTCTG TGTAACCTTC ACTGCATCCT TGCCCCATTC AGCCCGGCCT 
1001 TTCATGATGC AGGAGAGCAG GGATCCCGCA GTACATGGCG CCAGCACTGG 
1051 AGTTGGTGAG CATGTGCTCT CTCTTGAGAT TAGGAGCTTC CTTACTGCTC 
1101 CTCTGGGTGA TCCAAGTGTA GTGGGACCCC CTACTAGGGT CAGGAAGTGG 
1151 ACACTAACAT CTGTGCAGGT GTTGACTTGA AAAATAAAGT GTTGATTGGC 
1201 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAGGGCGGCC GCTCTAGAGG 
1251 ATCCAAGCTT ACGTAAAAAA AAAAAAAAAA AAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 



1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 
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BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 

PIR:S54250 ribosomal protein L2 - Arabidopais thaliana, N - 1, Score » 
76, P - 0.33 



>PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana 
Length - 258 



Score - 76 (11.4 bits), Expect - 4.0e-01, P - 3.3e-01 
Identities = 30/91 (32%), Positives = 44/91 (48%) 

Query: 15 PGHGAV PVPR I G FK FVNN F P FG LVDVN RAREVL PT AC AC LPAS S L FS FH YA P S PGG LALS 74 

PG GA P+ R+ F+ PF + +E+ A C P SSL+ A G L 

Sbjct: 52 PGRGA-PLARVTFRH PFRF KKQKELFVAAEVCTPVSSLYCGKKATLWGNVLP 103 

Query: 75 FSSYPQGPVLLCP HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A ++++V+ 
Sbjct: 104 LRSIPEGAVV-CNVEHHVGDRGVLARASGDYAIVI 137 

Pedant information for DKFZphtes3_15c6, frame 2 



Report for DKFZphtes3_15c6. 2 



[LENGTH] 

IMWJ 

Ipl] 

[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



118 

12413.79 
7.53 

LEUCIKE_ZIPPER 1 
MYRISTYL 1 
ASN GLYCOSYLATION 
TRANSMEMBRANE 1 



S EQ MVA I P PS AC L PACC PGHGAV PV PRIG FK FVNN FP FG L V DVN RAREV LPT ACAC LPAS SLF 

PRD cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 

MEM 

S EQ S FH YA PS PGG LALS FSS Y PQG P V LLC PHV PLGC L VEAL YN FS LVLC SFLLYFPAVSCP 

PRO eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 

MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphtes3_15c6. 2 

PS00001 100->104 ASNGLYCOSYLATION PDOC00001 
PS00008 70->76 MYRISTYL PDOC00008 

PS00029 84->106 LEUCINE ZIPPER PDOC00029 



(No Pfaro data available for DKFZphtes3_15c6 .2) 
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DKFZphtes3_15gl4 



group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR243c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBP 

Locus: unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 34 62, no polyadenylation signal found 

1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 
51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 

101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 

151 TGATGAACAG GGACAGTTAG TTAATAAGAC CATCGATGAG CCTATTTTCA 

201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 

251 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 

301 TACTTTGATT AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 

351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 

401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 

451 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 

501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 

551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 

601 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 

651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 

701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 

751 AGACCACAGA AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 

801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 

851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 

901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 

951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 
1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 
1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 
1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 
1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 
1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 
1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 
1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATTTGGGA AGGGAAGGAA 
1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 
1401 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 
1451 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 
1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 
1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 
1601 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 
1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 
1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 
1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 
1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 
1851 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 
1901 CTACAGACAT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 
1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 
2001 TGGAAGATCA TGACATTGAT GTCAAAACGA AAGGTTCCCA CATTGATGAA 
2051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 
2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 
2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 
2201 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT GATAGATACC 
2251 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 
2301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 
2351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 
2401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 
2451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AGACAATTAC 
2501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 
2551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 
2601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 
2651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 
2701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 
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2751 AAAGCAGTAT GGAGCATTAT 

2801 CAGTTTAACC ATTTTGGGAA 

2851 AAGGAAGAGA AGCTATATGC 

2901 ATATACAGAC ACTAAAAACA 

2951 GTAAGTAAAA TGATGTGTAT 

3001 AAAAAGCAAT GAACAATTTA 

3051 TAGAACACAT ATGTTACAAC 

3101 AATAAGAGAC ATGTTAGCAT 

3151 ACTAACCCAG TTTGAACCCT 

3201 GGAAAGTTAT TTAAACTCAT 

3251 CATGTTAACA TTGCCTACCT 

3301 AATATATGTA AAGCAACTTT 

3351 GTAAGTGTCT GCAGATGCAA 

3401 TCCCTTCCTG TTAAGATGAA 

3451 CGGCCGCTCA AGATGAAAAA 



ATATCAGTAA TGTGATATAT ATACTTAAGC 
ATGTTAGCAT TAGGAAATAA AATCCAAAAG 
AATGCAAAAT TTGCTTATTG CAATATTTTC 
GTTTTCAAAG TCCAGCATTA CGTAACTAAA 
CAACTTGATG GTAAAATATG TAGTTATTTA 
GTTTCATGAG AAAATGTTGC CCCCTAAAAG 
TGCAATAATA CTCTGAATTC ATCTTTCACA 
AGTGATTAAA AGCACAGATA TTGGAGACAA 
GGCACTGCCA CGTATAGCAC TGCAGCCTTG 
GGGCTTCAGT TTCAACATCT GTAAAATGGG 
CATAGGATTA CTGTGAGAAT TTTCTAAGTT 
AAAAAGTGCC TGGCACTTAG TTATTGTTAA 
GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 
AAAAAAAAAA AAAAAAAAAA AAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 



1 MEEDTDYRIR FSSLCFFNDH 

51 DEPIFKISEI QLEPNNFPKK 

101 HQSGSEKEDT IVDGTSKCEE 

151 KTELIGLPPE FSIGRILDKN 

201 LEYKELCHLV SEEEAFDFFK 

251 KKFGNLVETK SFSKMNCSAG 

301 IYTAFTLRKE NLEMFEAIGF 

351 RKVTPERLKN IEKEIEKKRM 

401 KQINDSANLR ERIMEAIENV 

451 KNEMMKAIKL FLTPEDLDDP 

501 LLEALHRFGM TEEGCIQAWF 

551 GARVVQGDLV CLDEDIDDEN 

601 YNIQYPKNKV GQWYHDILSR 

651 LSYQLMEDHD IDVKTKGSHI 

701 V 



VGFHGTIKSS PSDFIVIEID EQGQLVNKTI 
PKLDLQNLSL EDGRNQEVHT LIKYTDGDQN 
KADVLSSFLD EKTHELLNHF ACDVREKWLS 
QRASLHSAIR QKFPFLVTVG KNSEIWKPN 
YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 
NPNVVVTVRF REKAHKRGKR PLSECQEGKV 
LAIKLGVIPS DFSYAGLKDK KAITYQAMVV 
NVFNIRSVDD SLRLGQLKGN HFDIVIRNLK 
KKKGFVNYYG PQRFGKGRKV HTDQIGLALL 
VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 
SLPHSMRI FY VHAYTSKIWN EAVSYRLETY 
FPNSKIHLVT EEEGSANMYA IHQVVLPVLG 
DGLQTCRFKV PTLKLNIPGC YRQILKHPCN 
DETALSLLIS FDLDASCYAT VCLKEIMKHD 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15gl4, frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 .09"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosraid., N « 3, Score - 511, P - 2.9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces 
cerevisiae), N - 2, Score - 516, P =■ 7.3e-54 

SWISSPROT:YQ4B_CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.ll IN 
CHROMOSOME V., N = 2, Score » 386, P - 2 . le-34 



>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length » 676 

HSPs: 

Score - 516 (77.4 bits>, Expect - 7.3e-54, Sum P(2) - 7.3e-54 
Identities - 151/498 (301), Positives - 245/498 (49%) 

Query: 191 KNSEIVVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 24 9 
+ E V P L +L + EE+ Y A K + F+ +K R +H + 
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Sbjct: 109 RRQEFNVDPELR-NQLVEI FGEEDVLKIESVYRTANKMETAKNFE DKSVRTKIHQLL 164 

Query: 250 NKKFGNLVETKSFSKMNCSAGNPNVvVTVRFREKAHK-RGKRPLSECQEG-KVIYTAFTL 307 

+ F N +E+ + N +EK ++ R + G + FTL 

Sbjct: 165 REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 224 

Query: 308 RKENLEMFEAIGFLAIKLGVIPSD-FSYAGLKDKKAITlfQAMVVRKVTPERLKNIEKEIE 366 

KEN + EA+ + KL +PS YAG KD++A+T Q + + K+ +RL + + + 
Sbjct: 225 HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 282 

Query: 367 KKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENVKKKGFV 426 

K M + N D SL LG LKGN F +VIR++ N +L E + +++ + GF+ 
Sbjct: 283 -KGMI IGNYNFSDASLNLGDLKGNEFVVVIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 340 

Query: 427 NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNR-AKKYFLQTEDAK 485 

NY+G QRFG + T IG LL + KA +L L+ +D P ++ A+K + +T+DA 
Sbjct: 341 NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 399 

Query: 486 GTLSLMPEFKVRERALLEALHRFGMTEEGCIQ--AWFS LPHSMRI FYVHAYTSKIW 539 

L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 

Sbjct: 400 LALKQM PRQC LAENALLY S LSNQRKEE DGT YS ENAY YTA I MK I PRNL RTM Y VHA YQS Y VW 459 

Query: 540 NEAVSYRLETYGARVVQGDLVC LDEDIDDENFPNS KIHLVTEEEGS 585 

N S R+E +G ++V GDLV L IDDE+F + VT+E+ 

Sbjct: 460 NSIASKRIELHGLKLWGDLVIDTSEKSPLISGIDDEDFDEDVREAQFIRAKAVTQEDID 519 

Query: 586 ANMYAIHQVVLPVLGYNIQYPKNK-VGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQI 644 

+ Y + VVLP G+++ YP N+ + Q Y DIL D + + ++ G YR + 

Sbjct: 520 S VKYTMEDVVLP5 PGFDVLYPSNEELKQLYVDI LKADNMDPFNMRRKVRDFSLAGSYRTV 579 

Query: 645 LKHPCHLSYQLMEDHDIDVKTKGSHID 671 

++ P +L Y+++ D + + +D 
Sbjct: 580 IQKPKSLEYRI IH YDDPSQQLVNTDLD 606 



Score « B6 (12.9 bits). Expect - 3.2e-01, Sum P(2) » 2.8e-01 
Identities - 40/160 (25%), Positives - 77/160 (48%) 



Query: 22 GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPKLDLQNLSLE 81 

GF G IK +DF+V EID++G++++ T D+ FK+ + +P K +++ + S E 

Sbjct: 55 G FRGQ I KQRYTDFLVNEI DQEGKV I HLT - D KG- FKM PK KPQR--SKEEVNAEKES-E 106 

Query: 82 DGRNQEVHTLIKYTDGDQNHQSGS--EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 138 

R QE + D + +Q +ED + ++ + K + +F D+ ++ 

Sbjct: 107 AARRQEFNV DPELRNQLVEIFGEEDVLKIESVYRTANKMETAKN FEDKSVRTKIH 161 

Query: 139 NFAC DVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSAIRQ 181 

+RE + ++ E + FIR ++N R + I Q 

Sbjct: 162 QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 

Score - 58 (8.7 bits). Expect = 7.3e-54, Sum P(2) - 7.3e-54 
Identities - 10/23 (43%), Positives - 17/23 (73%) 

Query: 676 SLLISFDLDASCYATVCLKEIMK 698 

++++ F L S YAT+ L+E+MK 
Sbjct: 638 AVVLKFQLGTSAYATMALRELMK 660 



Pedant information for DKFZphtes3_15gl4, frame 2 



Report for DKFZphtes3_15gl4 .2 

[LENGTH] 701 

[MW] 80700.96 

[pi] 7.31 

[HOMOL] PIR:S67136 hypothetical protein YOR24 3c - yeast (Saccharomyces cerevisiae) 2e 
51 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YOR243c] 8e-53 

[BLOCKS] BL01268C 

[BLOCKS] BL01268B 

[BLOCKS] BL01268A 

[SUPFAM] hypothetical protein HI0701 3e-06 

[PROSITE] MYRISTYL 7 

[PROSITE) AMI DAT ION 2 

[PROSITE J CAMP_PHOSPHO_SITE 1 

[PROSITE) CK2 PHOSPHO_SITE 16 

[PROSITEJ TYR~PHOSPHO_SITE 1 

[PROSITE) PKC_PHOSPHO~SITE 13 

[PROSITE) ASN_GLYCOSYLATION 5 

[KW] Alpha_Beta 
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SEQ MEEDTDYRI RFSSLCFFNDHVGFHGTI KSSPSDFIVIEI DEQGQLVNKTI DEPI FKISEI 

PRD ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKLDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFSIGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhheeecccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIWKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhcccpcceeeecccccch 

SEQ HRKAVHHFVNKKFGNLVETKSFSKMNCSAGN PNVVVTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ IYTAFTLRKENLEMFEAIGFLAIKLGVIPSDFSYAGLKDKKAITYQAMVVRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ IEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhheeeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEDAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRI FYVHAYTSKIWN 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARVVQGDLVCLDEDIDDENFPNSKIHLVTEEEGSANMYAIHQWLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeecccccccccccceeeccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ IDVKTKGSHIDETALSLLISFDLDASCYATVCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_15gl4 . 2 



PS00001 


47 


->51 


ASN G LYCOS Y LAT I ON 


PDOC00001 


PS00001 


77 


->81 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


266- 


>270 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


404- 


>408 


ASN~ GLYCOSYLATION 


PDOC00001 


PS00001 


650- 


>654 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


351- 


>355 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


26 


i->29 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


105- 


>108 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


115- 


>118 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


232- 


>235 


PKC PHOSPHO SITE 


PDOC00005 


PS00O05 


237- 


>240 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


277- 


>280 


PKC~ PHOSPHO SITE 


PDOC00005 


PS00005 


306- 


>309 


PKC PHOSPHO~SITE 


PDOC00OO5 


PS00005 


381- 


>384 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


525- 


>528 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


535- 


>538 


PKC PHOSPHO SITE 


PDOC000O5 


PS00005 


544- 


>547 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


625- 


>628 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


632- 


>635 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


30 


->34 


CK2 PHOSPHORS ITE 


PDOC00006 


PS0OOO6 


49 


->53 


CK2_PHOSPHO~SITE 


PDOC00006 


PS00006 


79 


->B3 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


95 


->99 


CK2 PHOSPHO~SITE 


PDOC00006 


PS0O006 


103- 


>107 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


105- 


>109 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


110- 


>U4 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


116- 


>120 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


127- 


>131 


CK2~PHOSPHO~SITE 


PDOC000O6 


PS00006 


150- 


>154 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


211- 


>215 


CK2~PHOSPHO~SITE 


PDOC0O006 


PS00006 


237- 


>241 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


463- 


>4 67 


CK2 PHOSPHO SITE 


PDOC00006 


PS0OO06 


580- 


>584 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


668- 


>672 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


537- 


>546 


TYR~PHOSPHO SITE 


PDOC00007 


PS00008 


25 


->31 


MYRISTYL 


PDOC00008 


PS0O008 


43 


->49 


MYRISTYL 


PDOC00008 


PSOOO08 


114- 


>120 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



326->332 
385->391 
514->520 
622->628 
287->291 
436->440 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMIDAT20N 

AMI DAT I ON 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15gl4 .2) 
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DKFZphtes3_15hl 



group: testes derived 

DKFZphtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins. 

No informative BLAST results; No predictive prosite, pram or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 2277 bp 

Poly A stretch at pos. 2252, polyadenylation signal at pos. 2226 

1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 

51 TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 

101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 

151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 

201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 

251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 

301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 

351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 

401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 

451 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 

501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 

551 TCAGCCCATG AAACACCTCT TACACCCCAC CAAGGGAGAG CCCAAGTGGA 

601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 

651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 

701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 

751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 

801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 

851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 

901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 

951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 

1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 

1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 

1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 

1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 

1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 

1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 

1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 

1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 

1401 CAGGCACAAG TGAAGCTGAG AGACTTCGAG TCAGCCGTGA ACAATTTTGA 

1451 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 

1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 

1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 

1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 

1651 TGAGAGATGA GCCCGAGAAG GTGGTGAAGC AGTGGGACCA TAGTGAGGAT 

1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 

1751 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 

1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 

1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 

1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 

1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 

2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 

2051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 

2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 

2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 

2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 

2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 

BLAST Results 



Ho BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 69 bp to 2084 bp; peptide length: 672 
Category: similarity to known protein 



1 MSDPEGETLR STFPSYMAEG 

51 VARSKCFLKM GDLERSLKDA 

101 LVFYHRGYKL RPDREFRVGI 

151 QAENIKAQQK PQPMKHLLHP 

201 LEKLLLDEDL IKGTMKGGLT 

251 DRKLMQEKWL RDHKRRPSQT 

301 LKKVLEWNKE EVPNKDELVG 

351 KEYDLPDAKS RALDNIGRVF 

401 HEIGRCYLEL DQAWQAQNYG 

451 RDFESAVNNF EKALERAKLV 

501 ENLKEKSEGE ASLYEDRIIT 

551 DDEAFGEALQ SPASGKQSVE 

601 LEAGRRESRE IYRRPSGELE 

651 KTQFGEIGET KKTGNEMEKE 



ERLYLCGEFS KAAQSFSNAL YLQDGDKNCL 
EASLQSDPAF CKGILQKAET LYTMGDFEFA 
QKAQEAINNS VGSPSSIKLE NKGDLSFLSK 
TKGEPKWKAS LKSEKTVRQL LGELYVDKEY 
VEDLIMTG1N YLDTHSNFWR QQKPIYARER 
AHYILKSLED IDMLLTSGSA EGSLQKAEKV 
NLYSCIGNAQ I ELGQMEAAL QSHRKDLEIA 
ARVGKFQQAI DTWEEKIPLA KTTLEKTWLF 
EKSQQCAEEE GDIEWQLNAS VLVAQAQVKL 
HNNEAQQAII SALDDANKGI IRELRKTNYV 
REKDMRRVRD EPEKVVKQWD HSEDEKETDE 
AGKARSDLGA VAKGLSGELG TRSGETGRKL 
QRLSGEFSRQ EPEELKKLSE VGRREPEELG 
YE 



BLAST P hits 



Entry AF039202_1 from database TREMBL: 

product: "Hsp70/Hsp90 organizing protein"; Cricetulus griseus 

Hsp70/Hsp90 organizing protein mRNA, complete cds. 

Score - 149, P » 5.3e-07, identities - 42/160, positives - 74/160 

Entry AI09782_1 from database TREMBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds. 

Score = 155, P - 6.1e-07, identities » 140/623, positives - 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score - 156, P - 9.7e-08, identities - 41/153, positives - 72/153 



Alert BLASTP hits for DKFZphtes3_15hl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15hl, frame 3 



Report for DKFZphtes3_15hl . 3 



[ LENGTH ) 672 

[MW] 76655.61 

[pi) 5.49 

[HOMOLl PIR:S56658 stress-induced protein stil - soybean 6e-10 

[SUPFAM1 tetratricopeptide repeat homology le-07 

[PROSITE) MYRISTYL 7 

[PROSITEJ AMI DAT I ON 3 

[PROSITE) CAMP_PHOSPHO_SITE 4 

[PROSITE) CK2_PHOSPHO_SITE 15 

[PROSITE) TYR PHOSPHO SITE 1 

[PROSITE] PKC~PHOSPHO~SITE 11 

[PROSITE) ASN_GLYCOSYLATION 2 

(KW) All_Alpha 

[KW] LOW_COMPLEXITY 4.76 % 



SEQ MSDPEGETLRSTFPSYMAEGERLYLCGEFSKAAQSFSNALYLQDGDKNCLVARSKCFLKM 

SEG 

PRD cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 

SEQ GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 

SEG 

PRD hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 
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SEQ QKAQEAINNSVGSPSSIKLENKGDLSFLSKQAENIKAQQKPQPMKHLLHPTKGEPKWKAS 



PRD hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

SEQ LKSEKTVRQLLGELYVDKEYLEKLLLDEDLIKGTMKGGLTVEDLIMTGINVLDTHSNFWR 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

SEQ QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

SEQ LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

SEG 

PRD hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

SEQ RALDNIGRVFARVGKFQQAIDTWEEKIPLAKTTLEKTWLFHEIGRCYLELDQAWQAQNYG 

SEG 

PRD hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 

SEQ EKSQQCAEEEGDI EWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAI I 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SEQ SALDDANKGI IRELRKTNYVENLKEKSEGEASLYEDRI ITREKDMRRVRDEPEKVVKQWD 

SEG * 

PRD hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 

SEQ HSEDEKETDEDDEAFGEALQSPASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

SEG xxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

SEQ LEAGRRESREIYRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

SEG 

PRD hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 

SEQ KKTGNEMEKEYE 

SEG 

PRD cccccccccccc 
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PS00001 


128- 


>132 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


438- 


>442 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


265- 


>269 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


605- 


>609 


CAMP~PKOSPHO SITE 


PDOC 00004 


PS00004 


613- 


>617 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


636- 


>640 


CAMP PHOSPHO~SITE 


PDOC00004 


PS00005 


8 


->11 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


66 


i->69 


PKC PHOSPHO SITE 


PDOC 00005 


PS00005 


136- 


>139 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


180- 


>183 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


183- 


>166 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


186- 


>189 


PKC~PHOSPHO SITE 


PDOC00005 


PS0O005 


214- 


>217 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


342- 


>345 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


564- 


>567 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


596- 


>599 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


660- 


>663 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 




2->6 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


66 


>->70 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


93 


i->97 


CK2 PHOSPHORS ITE 


PDOC00006 


PS0O006 


171- 


>175 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


220- 


>224 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


277- 


>281 


CK2 PHOSPHO" SITE 


PDOC00006 


PS00006 


382- 


>386 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


392- 


>396 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


481- 


>485 


CK2 PH0SPH0"SITE 


PDOC00006 


PS00006 


507- 


>511 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


512- 


>516 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


542- 


>S46 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


548- 


>552 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


626- 


>632 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


663- 


>667 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00007 


506- 


>515 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


119- 


>125 


MYRISTYL 


PDOC00008 


PS00008 


132- 


>138 


MYRISTYL 


PDOC00008 


PS00008 


213- 


>219 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 



288->294 
320->326 
334->340 
590->596 
596->600 
603->607 
641->645 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DATION 
AMI DAT I ON 
AMI DATION 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15hl . 3) 
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DKFZphtes3_15i5 



group: cell structure and motility 

DKFZphtes3_15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins . 

The novel protein is similar to the Chlamydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63. This protein is important for the maintenance of a planar form of sperm flagellar 
beating, in addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 



strong similarity to "radial spokehead" proteins 

complete cDNA, complete .cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus: unknown 

Insert length: 2478 bp 

Poly A stretch at pos. 2452, polyadenylation signal at pos . 2433 



1 CACCCTGGCC CGCTCCCCGC 
51 GTGCTCAGAA ACCGGCGGTG 
101 TCTGCAAGCC TTTCTCCTAG 
151 ACCTGCCGCC CTACCCTGAG 
201 ACTTCTCAGG CCTCCCAGAG 
251 GGCAGCGGAC CCCGAGGAGA 
301 ACGCCCCTGG TTGGTCACAG 
351 CTGATGCCCC AGGTCTTCCA 
401 GTACCCATCT GTGAACACGG 
451 ACTCTGATGA AAGCAGGATG 
501 CTGCAGCGGC TCCAGCAGGG 
551 CACCTTCCAG GAGCCCCCAG 
601 AGACAGACCA GTTCTCTGAA 
651 GACCCTGCCC TTCAGTTCTT 
701 TGCCCAGGTG CCTGAGCCCG 
751 AGGCCTACCT GCTGCAGACC 
801 CACCTGGTAA ATCTGCTGAC 
851 CTTGTCTGTC CTGGAGTCTC 
901 ACCCCAAGCT GGACACGCTG 
951 AAGATGGCGG AGAAACAGAA 
1001 TGAAGGCGAA CAGGAGATGG 
1051 ACATCATGGA GACTGCCTTC 
1101 TCGGACGAGA GCTTCCGCAT 
1151 GCAGCCCATC CACACCTGTC 
1201 GCAGCTACCT GGTGGCCGAG 
1251 GAGGAGGAGG AGGTGGAGGA 
1301 GCACGGCGAG GAGGAGGGCG 
1351 TCCCTAAGTC CGTATGGAAG 
1401 CGCTCAGGCG CCAACAAGTA 
1451 GCCATGGACG CGGCTGCCCC 
1501 GAAAGATCAA GAAGTTCTTC 
1551 TACCCACCCT TCCCGGGCAA 
1601 CCGCATCTCG GCCGCCACGC 
1651 GTGAGGAGGA GGGCGACGAG 
1701 TACGAGGAGA ACCCGGACTT 
1751 CTCCATGGCC AACTGGGTGC 
1801 GCTGCACTTG GGTGAACCCT 
1651 GGGGAGGAGG AAGAGAAGGC 
1901 GGTTGGCCCC CCACTGCTAA 
1951 ACCTGGCACC CTGGACCACC 
2001 TCAGTGGCCG TTGTGCGCTC 
2051 CAGTGGCAAA AAGTTTGAGA 
2101 GCCCCGAGAG CTTCAACCCG 
2151 CCCAGTGGCC CAGAGATCAT 
2201 GCAGGCTCTG AAAGCAGCCC 
2251 AGGAGGAGGG CGAGGAGGAG 



GCCCTCCACG GGTAACGGCC CCCTCTCTCG 
TCGACAGGTG GCTCTCGCTT GGCCTCCTTG 
AGATCTGTGC CTCCTGGCGA ACCATGGGAG 
CGCCCTGCCC AGCAGCCTCC GGGCCGGAGG 
GCGGCACAGT CGGGACCAAG CTCAGGCCCT 
GGCAGCAGAT ACCTCCAGAC GCCCAGCGAA 
AGGGGCAGCC TGTCCCAACA GGAGAACTTG 
GGCTGAGGAA GCCCGGCTGG GTGGCATGGA 
GCTTTCCCTC AGAGTTCCAG CCTCAGCCTT 
CAGGTCGCCG AGCTCACCAC CAGCCTAATG 
CCAAAGCAGC CTGTTCCAGC AACTGGACCC 
TCAACCCCTT GGGCCAGTTC AACCTCTACC 
GGTGCCCAGC ACGGGCCTTA CATAAGGGAT 
GCCCTCTGAG CTGGGCTTCC C AC ACT AC AG 
AGCCTCTGGA GCTGGCCGTG CAGAACGCCA 
AGCATCAATT GCGACCTCAG CCTGTACGAG 
CAAGATCCTG AACCAGCGGC CTGAGGACCC 
TGAACCGCAC CACGCAGTGG GAGTGGTTCC 
CGGGACGACC CCGAGATGCA GCCCACCTAC 
GGCGCTGTTC ACCCGGAGTG GAGGCGGCAC 
AGGAGGAGGT GGGGGAGACA CCAGTGCCCA 
TACTTCGAGC AGGCCGGCGT CGGCCTGAGC 
TTTCCTGGCC ATGAAACAGC TGGTGGAGCA 
GCTTCTGGGG CAAGATCCTG GGAATCAAAC 
GTGGAATTCC GGGAGGGCGA GGAGGAGGCA 
GATGACGGAA GGTGGCGAGG TCATGGAGGC 
AGGAGGACGA GGAGAAGGCC GTGGACATCG 
CCGCCGCCCG TGATCCCCAA GGAGGAGAGC 
CCTGTACTTT GTGTGCAACG AGCCGGGCCT 
ACGTCACTCC AGCCCAGATC GTGAACGCCC 
ACAGGCTACC TGGACACGCC AGTCGTCAGC 
CGAGGCCAAC TACCTGCGGG CCCAGATAGC 
AGGTCAGCCC GCTGGGCTTC TACCAGTTTA 
GAGGAGGAAG GTGGTGCTGG GCGCGACTCC 
CGAGGGCATC CCCGTGCTGG AGCTGGTCGA 
ATCACACACA GCACATCCTG CCGCAGGGCC 
TTGCAGAAGA CAGAGGAGGA GGAGGACCTG 
AGATGAGGGG CCAGAGGAGG TGGAGCAGGA 
CGCCACTTTC AGAAGATGCA GAAATCATGC 
CGCCTGTCCT GCAGCCTCTG CCCGCAGTAC 
CAACCTCTGG CCCGGGGCCT ATGCCTATGC 
ACATCTACAT CGGCTGGGGT CACAAGTACA 
GCCCTGCCAG CCCCCATTCA ACAAGAGTAC 
GGAGATGAGT GACCCCACAG TGGAAGAGGA 
AGGAACAAGC CCTGGGAGCC ACAGAGGAGG 
GAGGAGGGCG AGGAGACAGA TGACTGAGGC 
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2301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 
2351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 
2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 
2451 GCATTAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



86251010: 

Molecular cloning and expression cf flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlamydomonas flagella: polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 



ORF from 14 4 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 



1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR DQAQALAADP EERQQIPPDA 

51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 

101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 

151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 

201 NAKAYLLQTS INCDLSLYEH LVNLLTKILN QRPEDPLSVL ESLNRTTQWE 

251 WFHPKLDTLR DDPEMQPTYK MAEKQKALFT RSGGGTEGEQ EMEEEVGETP 

301 VPN I MET A FY FEQAGVGLSS DESFRIFLAM KQLVEQQPIH TCRFWGKILG 

351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 

401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 

451 NARKIKKFFT GYLDTPVVSY PPFPGNEAKY LRAQIARISA ATQVSPLGFY 

501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP VLELVDSMAN WVHHTQHILP 

551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 

601 IMHLAPWTTR LSCSLCPQYS VAVVRSNLWP GAYAYASGKK FENIYIGWGH 

651 KYSPESFN PA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 
701 EEEEEGEEEE EGEETDD 



Entry U73123_l from database TREMBL: 

product: "radial spokehead"; Strongylocentrotus purpuratus radial 
spokehead mRNA, complete cds . 

Score - 1604, P - 7.4e-165, identities = 303/523, positives - 395/523 
Entry 844498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score - 386, P => 3.4e-45, identities » 105/264, positives - 138/264 



Peptide information for frame 3 



BLAST P hits 



Alert BLASTP hits for DKF2phtes3_15i5, frame 3 



No Alert BLASTP hits found 



Pedant information for DKF2phtes3_15i5, frame 3 



Report for DKFZphtes3_15i5.3 



[LENGTH] 

[KWJ 

[pi] 



717 

80913.61 
4.36 
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(HOMOL] 


TREMBL : U73123 l product: "radial spokehead"; Strongylocentrotus purpuratus 


radial spokehead mRNA, complete cds, 


, le-130 


(PROSITEJ 


TRANSFERRIN^ 1 




(PROSITE] 


MYRISTYL ~ 5 




(PROSITE] 


AM I DAT I ON 2 




| PROSITEJ 


CAMP PHOSPHO_SITE 


2 


| PROSITE] 


CK2 PHOSPHO_SITE 


14 


t PROSITE] 


TYR PHOSPHO SITE 


1 


(PROSITEJ 


GLYCOSAMINOGLYCAN 


1 


( PROSITE] 


PKC PHOSPHO SITE 


8 


[ PROSITEJ 


AS N~G LYCOS YLAT ION 


1 


(KW) 


All~Alpha 




[KW) 


LOW COMPLEXITY 21. 


.48 % 



SEQ MGDLPPYPERPAQQPPGRRTSQASQRRHSRDQAQALAADPEERQQIPPDAQRNAPGWSQR 

SEG .... xxxxxxxxxxxx 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccccccccccccc 

SEQ GSLSQQENLLMPQVFQAEEARLGGMEYPSVNTGFPSEFQPQPYSDESRMQVAELTTSLML 

SEG xxxx 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhhhhhh 

SEQ QRLQQGQSSLFQQLDPTFQEPPVNPLGQFNLYQTDQFSEGAQHGPYIRDDPALQFLPSEL 

SEG xxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GFPHYSAQVPEPEPLELAVQNAKAYLLQTSINCDLSLYEHLVNLLTKILNQRPEDPLSVL 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhccccchhhh 

SEQ ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

SEC xxxxxxxxxxxxxxxx . . 

PRD hhhchhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 

SEQ VPNI KETA FYFEQAG VG LS S DESFRI FLAMKQLV EQQ P I HTC RFWG K I LG I KRS Y LVAEV 

SEG xxx 

PRD ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhhhhhcccchhhhhhhh 

SEQ EFREGEEEAEEEEVEEMTEGGEVKEAHGEEEGEEDEEKAVDIVPKSVWKPPPVIPKEESR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeecccccccccccccccccc 

SEQ SGANKYLYFVCNEPGLPWTRLPHVTPAQIVNARKIKKFFTGYLDTPVVSYPPFPGNEANY 

SEG 

PRD cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhcccccccccccccccchhhh 

SEQ LRAQIARISAATQVSPLG FYQFSEEEGDEEEEGGAGRDSYEENPDFEGIPVLELVDSMAN 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecchhhh 

SEQ WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

SEQ IMHLAPWTTRLSCSLCPQYSVAWRSKLWPGAYAYASGKKFENIYIGWGHKYSPESFNPA 

SEG 

PRD cccccccccccccccccccceeeeeeccccceeeecccccceeeeeeccccccccccccc 

SEQ LPAPIQQEYPSGPBIMEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

SEG xxxxxxxxxxxxxx . . . xxxxxxxxxxxxxx . . . 

PRD cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
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PS00002 
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PS00004 


18->22 


PS00004 


26->30 
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24->27 


PS00005 


58->61 


PS00005 
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PS0Q0O5 


323->326 


PS00005 


341->344 


PS00005 


608->611 


PSOOOOS 


637->640 


PS00006 


64->68 


PS00006 


137->141 



ASN_G LYCOS Y LAT I ON 
GLYCOSAMINOGLYCAN 
CAMP_PHOS PHO_SITE 
CAMP_PHOSPHO~SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO SITE 
PKC_P HOS P HO~S I T E 
PKC PHOSPHORS I TE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO SITE 
CK2 PHOSPHORS I TE 
CK2~PHOSPHO SITE 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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PS00006 


216- 


■>220 


CK2 PHOSPHO 
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PDOC00006 


PS00006 


238- 


>242 


CK2~PHOSPHO~ 


"SITE 


PDOC00006 


PS00006 


247- 


>251 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


258->262 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


286- 


■>290 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


319->323 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


503- 


■>507 


CK2 PH0SPHO~SITE 


PDOC00006 


PS00006 


519->523 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


563- 


>567 


CK2 PHOSPHO SITE 


PDOC00006 


PS0000G 


671- 


>67S 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


682- 


•>686 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


700- 


>704 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


639- 


>646 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


264- 


>290 


MYRISTYL 


PDOC00008 


PS00008 


315- 


>321 


MYRISTYL 




PDOC00008 


PS00008 


350- 


>356 


MYRISTYL 




PDOC00008 


PS00008 


435- 


>441 


MYRISTYL 




PDOC00008 


PS00008 


475- 


>481 


MYRISTYL 




PDOC00008 


PS00009 


16 


i->20 


AM I DAT ION 




PDOC00009 


PS00O09 


637- 


>641 


AMI DAT ION 




PDOC00009 


PS00205 


619- 


>628 


TRANSFERRIN 


.1 


PDOC00182 



(No Pfam data available Cor DKF2phtes3_15i5 . 3) 
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DKFZphtes3_15jl8 



group: testes derived 

DKFZphtes3 15jl8 encodes a novel 148 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

complete cDNA, complete cds, few EST hits 
Sequenced by GBF 
Locus: unknown 
insert length: 905 bp 

Poly A stretch at pos. 839, polyadenylation signal at pos. 815 

1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 
51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGGAGGG CACAGCTTGT 
101 CCAGCTGGGA TGTTTGGGTG CCCTGTGAGA TGCCCCAAGC CACCAACCCA 
151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 
201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGCTTAG AATCTCTTGG 
251 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 
301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 
351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 
401 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 
4 51 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 
501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 
551 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 
601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 
651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 
701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 
751 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 
801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 
851 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAACCTTACG TACGCGAAAA 
901 AAAAG 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 110 bp to 553 bp; peptide length: 14 
Category: putative protein 



1 MFGCPVRCPK PPTQLISGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15 jl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15jl8, frame 2 



599 



WO 01/12659 



PCT/IB00/01496 



Report for DKFZphtes3_15}18.2 



(LENGTH] 148 

(MW) 15665.78 

tpl] B.91 

(PROSITEJ MYRISTYL 3 

(PROSITE] CK2_PH0SPH0_S1TE 

(KW) Irregular 



SEQ MFGCPVRCPKPPTQLISGEASAARLPAWRDVLQQPGVGGEGGLRISWQGAPKSRVRPAFI 

PRD cccccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 

SEQ SPVPFTVLQSQHYHPFSEGVGTQVECLTPVLRLESDMARTAPHPSSLHPFPAWDSSSPVH 

PRD cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 

SEQ CGAPLPSAHGGFPRARAEGSWSQPGAGS 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCC 



Prosite for DKFZphtes3_15 j 18 . 2 

PS00006 82->86 CK2_PHOSPHO SITE PDOC00006 

PS00008 38->44 MYRISTYL ~ PDOC00008 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 49->55 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_15 j 18 ,2) 
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DKFZphtes3_15j3 



group: nucleic acid management 

DKFZphtes3_15j3 encodes a novel 74 3 amino acid protein with similarity to proteins with 
unknown function. 

The novel protein contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR276c, a ribonuclease H of S. cerevisiae. Thus, the protein 
seems to a new RNA-modif icating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations. 

"44M2.3"; product, differences to genmodel, similarity to ribonuclease 
H 

complete cdna, complete cds, EST hits 
YGR276C - ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

Locus: /map-"16pll .2" 

Insert length: 2695 bp 

Poly A stretch at pos , 2601, polyadenylation signal at pos. 2579 



1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 
51 GCCCGTCTTC TCTTCTGCGT TTCCCGCGCT AGGGGGCGTG GGGAGTGGTT 
101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 
151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 
201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 
251 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 
301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 
351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 
401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 
451 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 
S01 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 
551 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 
601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 
651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 
701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 
751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 
801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 
851 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 
901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 
951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 
1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 
1051 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 
1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 
1151 ATACATCCAT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 
1201 CAGAAGATTT AAGCTCAAGT TCTTAGCCAA AGTTATTTTG GGGAAGGATA 
1251 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 
1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 
1351 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 
1401 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 
1451 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 
1501 GGAGACAGAT GCTGGTGAAC TTCCATCTTC CAGAAATTGT CAAACTATTA 
1551 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 
1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 
1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 
1701 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 
1751 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 
1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 
1851 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 
1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 
1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 
2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 
2051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 
2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 
2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 
2201 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 
2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 
2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC TCATGGTTCA 
2351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 
2401 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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2451 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 

2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 

2551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 

2601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 188 bp to 2416 bp; peptide length: 743 
Category: similarity to known protein 



1 MEPEREGTER HPRKVRESRQ APNKLVGAAE AMKAGWDLEE SQPEAKKARL 
51 STILFTDNCE VTHDQLCELL KYAVLGKSNV PKPSWCQLFH QNHLNNWVF 
101 VLQGMSQLHF YRFYLEFGCL RKAFRHKFRL PPPSSDFLAD VVGLQTEQRA 
151 GDLPKTMEGP LPSNAKAAIN LQDDPI IQKY GSKKVGLTRC LLTKEEMRTF 
201 HFPLQGFPDC ENFLLTKCNG SIADNSPLFG LDCEMCLTSK GRELTRISLV 
251 AEGGCCVMDE LVKPENKILD YLTSFSGITK KILNPVTTKL KDVQRQLKAL 
301 LPPDAVLVGH SLDLDLRALK M1HPYVIDTS LLYVREQGRR FKLKFLAKVI 
351 LGKDIQCPDR LGHDATEDAR TILELARYFL KHGPKKIAEL NLEALANHQE 
401 IQAAGQEPKN TAEVLQHPNT SVLECLDSVG QKLLFLTRET DAGELPSSRN 
4 51 CQTIKCLSNK EVLEQARVEI PLFPFSIVQF SFKAFSPVLT EEMNKRMRIK 
501 WTEISTVYAG PFSKNCNLRA LKRLFKSFGP VQSMTFVLET RQVQRPVTEL 
551 TL0CDTLVNE LEGOS ENQGS I YLSGVSETF KEQLLQEPRL FLGLEAVILP 
601 KDLKSGKQKK YCFLKFKSFG SAQQALNILT GKDWKLKGRH ALTPRHLHAW 
651 LRGLPPESTR LPGLRWPPP FEQEALQTLK LDHPKIAAWR WSRKICKLYN 
701 SLCPGTLCLI LLPGTKSTHG SLSGLGLMGI KEEEESAGPG LCS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15j3, frame 2 

TREMBL:AC004381_4 gene: M 44M2.3"; product: "Unknown gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence., 
N - 2, Score - 1827, P = 2 . le-284 

TREMBL:AF0164 30_4 gene: "C05C8.5"; Caenorhabditis elegans cosmid 
C05C8., N - 2, Score - 370, P - 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharomyces 
cerevi3iae), N » 2, Score - 334, P - 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPAC637 . 09" ; product: "putative 
exonuclease"; S.pombe chromosome I cosmid c637., N - 3, Score - 326, P 
- 2.8e-27 



>TREMBL:AC004381 4 gene: "4 4M2.3"; product: "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence. 
Length - 547 

HSPs: 

Score - 1827 {274.1 bits). Expect - 2. le-284, Sum P(2) - 2.1e-2B4 
Identities - 358/373 (95%), Positives - 358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPCX:ENFLLTKCNGSIAD 224 

AKAAINLQDOPIIQKYGSKKVGLTRCLLTKEEMRT FHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct: 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRT FHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query: 225 NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 269 

NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 
Sbjct: 121 NSPLFGLDCEMARTTFNFSIGVLQAECLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 180 

Query: 270 DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 329 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 
SbjCt: 181 DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 240 

Query: 330 SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDART I LELARY FLKHGPKKIAE 389 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDART I LELARY FLKHGPKKIAE 
SbjCt: 241 SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDART I LELARY FLKHGPKKIAE 300 

Query: 390 LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 449 

LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 
Sbjct: 301 LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 360 

Query: 450 NCQT I KCLSNKEV 4 62 

NCQT I KCLSNKEV 
SbjCt: 361 NCQTI KCLSNKEV 373 

(139.4 bits). Expect - 2.1e-284, Sum P (2 ) » 2.1e-284 
■ 175/179 (97%), Positives - 177/179 (98%) 

Query: 538 LETRQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 597 

L + +VQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 
Sbjct: 368 LSNKEVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAV 427 

Query: 598 ILPKDLKSGKQKKYCFLKFKS FGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 657 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 
SbjCt: 428 ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 487 

Query: 658 STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 716 

STRLPGLRVVPPPFEQEALQTLKLDHPKIAAHRWSRKIGKLYNSLCPGTLCLILLPGTK 
SbjCt: 488 STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 546 



Score - 929 
Identities < 



Pedant information for DKFZphtes3_15j3, frame 2 



Report Cor DXFZphtes3_15j3.2 



[ LENGTH 1 

[MW] 

tpll 

[ HOMOL 1 

Chromosome 

{ FUNCAT] 

j FUNCAT } 

[ FUNCAT) 

YGL094C] le 

{ FUNCAT] 

cerevisiae, 

[FUNCAT] 

(PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 



743 

83536.58 
8.87 

TREMBL: AC004381_4 gene: "44M2.3*'; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2, complete sequence. 0.0 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 4e-30 

99 unclassified proteins [S. cerevisiae, YLR107w) 3e-13 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

10 

04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) {S. 
YGL094C] le-10 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080cJ 2e-10 
MYRISTYL 5 



ami dat ion 1 
ck2 phospho site 
tyr"phospho~site 
glycosaminoglycan 
pkc phospho_site 

ASN~GLYCOSYLATION 



8 
1 
1 

16 
2 



RNA~recognition motif, (aka RRM, RBD, or RNP domain) 
Alpha_Beta 



SEQ meperegterhprkvresrqapnklvgaaeamkagwdleesqpeakkarlsti lftdnce 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ vthdqlcellkyavlgksnvpkpswcqlfhqnhlnnvvvfvlqgmsqlhfyrfylefgcl 

PRD eehhhhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

seq rkafrhkfrlpppssdfladvvglqteqragdlpktmegplpsnakaainlqddpi iqky 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

seq gskkvgltrclltkeemrtfhfplqgfpdcenflltkcngsiadnsplfgldcemcltsk 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

seq greltrislvaeggccvmdelvkpenkildyltsfsgitkkilnpvttklkdvqrqlkal 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLPJaKMIHPYVIDTSLLYVREQGRRFKLKFLAKVILGKDIQCPDR 

PRD hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTIKCLSNKEVLEQARVEI PLFPFSIVQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFS P VLTEEMNKRMR IKWTEI STVYAG P FS KNCN L RAL KRLFKS FG PVQS MT FVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRWPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhHhheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGIKEEEESAGPGLCS 

PRD cccccccchhhhhhccccccccc 



Prosite for DKFZphtes3_15j3. 2 



PS00001 


219 


->223 


ASN GLYCOSYLATION 


PDOCO00O1 


PS00O01 


419 


->423 


ASN GLYCOSYLATION 


PDOC000O1 


PS00002 


723 


->727 


GLYCOSAMINOGLYCAN 


PDOC000O2 


PS00O0S 




8->ll 


PKC PHOSPHO SITE 


PDOC000O5 


PS00005 


182 


->185 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


238 


->241 


PKC~PHOSPHO SITE 


PDOC000O5 


PS00005 


279 


->282 


PKC PHOSPHO SITE 


PDOC000O5 


PS00O05 


287 


->290 


PKC PHOSPHO SITE 


PDOC000O5 


PS00O05 


447 


->450 


PKC PHOSPHORS I TE 


PDOC00005 


PS00O05 


453 


->456 


PKC~PHOSPHO SITE 


PDOC000O5 


PS00O05 


458 


->461 


PKC PHOSPHO SITE 


PDOC000O5 


PS00005 


481 


->484 


PKC PHOSPHO~SITE 


PDOC000O5 


PS00005 


579 


->582 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


605 


->608 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OO5 


630 


->633 


PKC PHOSPHORS I TE 


PDOC000O5 


PSO0OO5 


643 


->646 


PKC PHOSPHO SITE 


PDOC00005 


PSO0OO5 


658 


->661 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


678 


->681 


PKC~PHOSPHO SITE 


PDOC00005 


PS00O05 


692 


->695 


PKC~PHOSPHO SITE 


PDOC000O5 


PS00O06 


41->45 


CK2 PHOSPHORS I TE 


PDOC000O6 


PS00006 


193 


->197 


CK2 PHOSPHO~SITE 


PDOC000O6 


PS00OO6 


221 


->225 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


371 


->375 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


421 


->425 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


458 


->462 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


579 


->583 


CK2 PHOSPHORS I TE 


PDOC00006 


PS0Q006 


630 


->634 


CK2~PHOSPHO SITE 


PDCC00006 


PS00OO7 


370 


->379 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


27->33 


MYRISTYL 


PDOC00008 


PS00008 


186 


->192 


MYRISTYL 


PDOC00008 


P500O08 


575 


->581 


MYRISTYL 


PDOC00008 


PSO0O08 


714 


->720 


MYRISTYL 


PDOC00008 


PS00O08 


720 


->726 


MYRISTYL 


PDOC00008 


PS00009 


337 


->341 


AMI DAT I ON 


PDOC00009 



Pfam for DKF2phtes3_15j3.2 



HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM * IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 

IY+ +++ +t +E+L + + F + + + +++D G+ + ++F +F++ 
Query 571 IYLSGVS-ETFKEQLLQEPRLFLGLEAVILPKDLKSGKQKKYCFLKFKS 618 

HMM EEDAekAIdeMNG. .meFmGRrlRV* 

+A+ A+ + G ++ GR + 
Query 619 FGSAQQALNILTGKDWKLKGRHALT 643 
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DKFZphtes3_15kll 



group: signal transduction 

DKFZphtes3_15kll encodes a novel 958 amino acid protein C-terminal identical with human 
KIAA0781 protein and high similarity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 
serine/threonine protein kinase active-site signature. The related murine kinase was cloned 
from the myocardium of the developing heart. 

The new protein can find application in modulation of intracellular signal pathways dependent 
on this kinase. 

KIAA0781, 5' extension 

complete cDNA, complete cds, potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /map- "11 " 
Insert length: 4868 bp 

Poly A stretch at pos. 4798, polyadenylation signal at pos. 4776 



1 GAGCAAGCGG AGCGGCCGTC GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 
51 CGCCCGCCCG CGCTCCTGTC CGCCGTGTCT AGCAGCGCGG CCCAGCATGG 
101 TCATGGCGGA TGGCCCGAGG CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 
151 TTCTACGACA TCGAGGGCAC GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 
201 GCTGGGGCGG CACCGGATCA CCAAGACGGA GGTGGCAATA AAAATAATCG 
251 ATAAGTCTCA GCTGGATGCA GTGAACCTTG AGAAAATCTA CCGAGAAGTA 
301 CAAATAATGA AAATGTTAGA CCACCCTCAC ATAATCAAAC TTTATCAGGT 
351 AATGGAGACC AAAAGTATGT TGTACCTTGT GACAGAATAT GCCAAAAATG 
401 GAGAAATTTT TGACTATCTT GCTAATCATG GCCGGTTAAA TGAGTCTGAA 
451 GCCAGGCGAA AATTCTGGCA AATCCTGTCT GCTGTTGATT ATTGTCATGG 
501 TCGGAAGATT GTGCACCGTG ACCTCAAAGC TGAAAATCTC CTGCTGGATA 
551 ACAACATGAA TATCAAAATA GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 
601 AGTGGTGAAC TGCTGGCAAC ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 
651 AGAAGTCTTT GAAGGGCAGC AGTATGAAGG ACCACAGCTG GACATCTGGA 
701 GTATGGGAGT TGTTCTTTAT GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 
751 GGACCGACTC TTCCAATTTT GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 
801 GATTCCGTAT TTCATGTCAG AAGATTGCGA GCACCTTATC CGAAGGATGT 
851 TGGTCCTAGA CCCATCCAAA CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 
901 AAATGGATGC TCATAGAAGT TCCTGTCCAG AGACCTGTTC TCTATCCACA 
951 AGAGCAAGAA AATGAGCCAT CCATCGGGGA GTTTAATGAG CAGGTTCTGC 
1001 GACTGATGCA CAGCCTTGGA ATAGATCAGC AGAAAACCAT TGAGTCTTTG 
1051 CAGAACAAGA GCTATAACCA CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 
1101 GCGCCTGAAA TCACATCGGA GCAGTTTCCC AGTGGAGCAG AGACTTGATG 
1151 GCCGCCAGCG TCGGCCTAGC ACCATTGCTG AGCAAACAGT TGCCAAGGCA 
1201 CAGACTGTGG GGCTCCCAGT GACCATGCAT TCACCGAACA TGAGGCTGCT 
1251 GCGATCTGCC CTCCTCCCCC AGGCATCCAA CGTGGAGGCC TTTTCATTTC 
1301 CAGCATCTGG CTGTCAGGCG GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 
1351 GACACTCCAA AGGTCAATGG CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 
1401 GGTGCGGAAG GGATGCCAGT CACTGCCCAG CAACATGATG GAGACCTCCA 
1451 TTGACGAAGG GCTGGAGACA GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 
1501 GCCTTTGAGG CATTTCAGTC CACACGCAGC GGGCAGAGAC GGCACACTCT 
1551 GTCAGAAGTG ACCAATCAAC TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 
1601 TCTCCATGAA TGACAGCCCC TCCCTTGACA GTGTGGACTC TGAGTATGAT 
1651 ATGGGGTCTG TTCAGAGGGA CCTGAACTTT CTGGAAGACA ACCCTTCCCT 
1701 TAAGGACATC ATGTTAGCCA ATCAGCCTTC ACCCCGCATG ACATCTCCCT 
1751 TCATAAGCCT GAGACCTACC AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 
1801 AAACGAGAGG TCCACAACAG GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 
1851 AGCATCAGAT ACCTCCCTCA CCCAGGGAAT TGTAGCATTT AGACAACATC 
1901 TTCAGAATCT GGCTAGAACC AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 
1951 TTGTTGTATG AACAAATAGG ACCGGAGGCA GACCCTAACC TGGCGCCGGC 
2001 GGCTCCTCAG CTCCAGGACC TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 
2051 CTCAGCAGCA GGAAAGCGTC TCCACTCTCC CTGCCAGCGT GCATCCCCAG 
2101 CTGTCCCCAC GGCAGAGCCT GGAGACCCAG TACCTGCAGC ACAGACTCCA 
2151 GAAGCCCAGC CTTCTGTCAA AGGCCCAGAA CACCTGTCAG CTTTATTGCA 
2201 AAGAACCACC GCGGAGCCTT GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 
2251 CAGAAGCGAC TCTTTCTTCA GAAGCAGTCT CAACTGCAGG CCTATTTTAA 
2301 TCAGATGCAG ATAGCAGAGA GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 
2351 CCCTTCCCCG CCAGGAGACT CCACCGCCTT CTCAGCAGGC CCCACCGTTC 
2401 AGCCTGACCC AGCCCCTGAG CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 
2451 GCAATACAGC CCTTTCCTCA GCCAGTACCA AGAGATGCAG CTTCAGCCCC 
2501 TGCCCTCCAC TTCCGGTCCC CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 
2551 CAGCAGCAGC AGCCGCCACC GCCACCACCC CCTCCACCAC CACGACAGCC 
2601 AGGAGCTGCC CCAGCCCCCT TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2651 CAAGCGCTGC TTCCCCTGCG 
2701 GTGGATGGAG CCCAGCAGAG 
2751 CCCAGGACTG CAAGAGGCCC 
2801 AGCTACCTGG ACTCTTTGAT 
2851 CACAACGGGT ATGTCCTGGT 
2901 GTCAGGTGAA GGAAGAGTGT 
2951 TTTAAAGCTT ATTTTCTTGC 
3001 CCAACTGGAA TCAGAGGGTC 
30 SI TCTGCCCCAC CACAAAGTTT 
3101 GCTGAGGCTC CTGCCCTTCG 
3151 CTGACAAATG TGTTCCTAAG 
3201 TTACATCCGT TTATTATCAA 
3251 GTGCTATTGC ATATATATGG 
3301 TGAGCAACCA CATATTGCTA 
3351 AGATGCACAG GAAATAAAGG 
3401 TCTTAGCTGC TGATGCAAGT 
3451 GCATGAGCTG TGTTTCAGGG 
3501 GAAACCGCCT TCATCTCCAT 
3551 GTGCCCTGGG TTGCCGAGTG 
3601 ACACGTGGAA CTGACAGGAG 
3651 ACTCAAGAAC GCATCAAGAG 
3701 TTCCTGCAGT TTCTCGTGGA 
37 51 GGGTACCTGT TGTCTCTTTT 
3801 ATATGTTGCT AGTAGTTTAT 
3851 AGGGCTTAGA GATTTTAAGG 
3901 GAAGTGGTAG TGCGGTGCCT 
3951 TTCTGTAGAA ACCAACAGTT 
4001 ACTTCAGAGT TTGTTTTCCA 
4051 TAAAGTTTTG ACTTGTAATG 
4101 GAAAGAACCA CAGATGTGTT 
4151 TAT ATT ACTA ATAAAACTTA 
4201 AGTAAAAATT AGATGCTACA 
4251 ATAATTTGCC ATTTGGACAG 
4301 AATTCTAAAG ATGATCATTT 
4351 AGATGAATGT GTTAAGCACA 
4401 ACTAACTGAT GCTGCATCTA 
4451 GTAGTTAGCG TTCAGGCAGG 
4501 CTGGCCATGC GAGCCCAGCT 
4551 TGTTGCTGGC CAGAGACTGC 
4 601 GATGCTTCGC AGAGGCACTG 
4 651 AAGGGCAGTG TGGGGACTGT 
4701 AATCCAGGAA GAATGAATTA 
4751 CGTGCTTAAG ATTGATGATT 
4801 AAAAAAAAAA AAAAAAAGGG 
4851 CGCGTGAAAA AAAAAAAG 



CCAGACTATC CCACTCCCTG TCAGTATCCT 
CGACCTAACG GGGCCAGACT GTCCCAGAAG 
CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
TGTGAAATGC TAGACGCTGT GGATCCACAA 
GAATTAGTCT CAGCACAGGA ATTGAGGTGG 
ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
AAGACATTCA GACCCAGGTC TTATGCAGGA 
GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
GGGAAAAGGC AATATATTTT TCACTGAAGC 
CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
AAAGCTGTGC TTTGTCATTG AATCCTAAGT 
TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
GCCACTAAAT AACAGCTGGT ACTGACCCCA 
TCGGAAGCAG GTGACACACC CCTTCAGAAG 
TCAGAATATA CTCAGGACTC CAGAGGTGTC 
ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
CCGATGTAAT AACTACTTTG ACCTTACACT 
TGAGCTTTGT ATATTTGGAC AGTTTCATAT 
ACATGATAAA TGAACTTTTC TGTCCCATGT 
TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
TCCATTTATG TCAATGCTAA ATCCAAAGTC 
CCATGTGGGA ATCAGCATTC TTAATTTCGT 
AAATGTTCAA GTATTACAGC AATATTCAAA 
AACCATTTAA GCAGATCATC TGCCAAACAT 
ACCAACACTT ACAATTCAGT CATCAAAGTA 
GCTAGCTAAC TGTATCCCTA GAAATGATGA 
TTAACATCCA GGTGTTACAA AGTCAGTGTT 
CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
CCTACCAACG TCGGTAACTT GAGCAGTCCC 
CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
CATTTTTGTG ATTTAATAAC ACACAGTGAA 
AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
TCGTGAAATA AAGAACATCA TTTCATTTAA 
CGGCCGCTCT AGAGGATCCA AGCTTACGTA 



BLAST Results 



Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score - 1605, P - 1.9e-66, identifies - 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens roRNA for KIAA0781 protein, partial cds . 
Score - 10725, P » 0.0e+00, identities =■ 2145/2145 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 2874 bp; peptide length: 959 
Category: known protein 



1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADGPR HLQRGPVRVG 

51 FYDIEGTLGK GNFAVVKLGR HRITKTEVAI KI IDKSQLDA VNLEKIYREV 

101 QIMKHLDHPH IIKLYQVMET KSMLYLVTEY AKNGEIPDYL ANHGRLNESE 

151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNMNIKI ADFGFGNFFK 

201 SGELLATWCG SPPYAAPEVF EGQQYEGPQL DIWSMGWLY VLVCGALPFD 

251 GPTLPILRQR VLEGRFRIPY FMSEDCEHLI RRMLVLDPSK RLTIAQIKEH 
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301 KWMLIEVPVQ RPVLYPQEQE NEPSIGEFNE QVLRLMHSLG IDQQKTIESL 
351 QNKSYNHFAA IYFLLVERLK SHRSSFPVEQ RLDGRQRRPS TIAEQTVAXA 
401 QTVGLPVTMH SPNMRLLRSA LLPQASNVEA FSFPASGCQA EAAFMEEECV 
4 51 DTPKVNGCLL DPVPPVLVRK GCQSLPSNMM ETSIDEGLET EGEAEEDPAH 
501 AFEAFQSTRS GQRRHTLSEV TNQLVVMPGA GKIFSMNDSP SLDSVDSEYD 
551 MGSVQRDLNF LEDNPSLKDI MLANQPSPRM TSPFISLRPT NPAMQALSSQ 
601 KREVHNRSPV SFREGRRASD TSLTQGIVAF RQHLQNLART KGILELNKVQ 
651 LLYEQIGPEA DPNLAPAAPQ LQDLASSCPQ EEVSQQQESV STLPASVHPQ 
701 LSPRQSLETQ YLQHRLQKPS LLSKAQNTCQ LYCKEPPRSL EQQLQEHRLQ 
751 QKRLFLQKQS QLQAYFNQMQ IAESSYPQPS QQLPLPRQET PPPSQQAPPF 
801 SLTQPLSPVL EPSSEQMQYS PFLSQYQEMQ LQPLPSTSGP RAAPPLPTQL 
851 QQQQPPPPPP PPPPRQPGAA PAPLOFSYQT CELPSAASPA PDYPTPCQYP 
901 VDGAQQSDLT GPDCPRSPGL QEAPSSYDPL ALSELPGLFD CEMLDAVDPQ 
951 HNGYVLVN 

BLASTP hits 

No blastp hits available 

Alert BLASTP hits for DKFZphtes3_15kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15kll, frame 1 



Report for DKFZphtes3_15kll . 1 



[LENGTH] 926 

[MW] 103915.77 

tpl) 5.70 

[HOMOL] TREMBL:AB018324_1 gene: 

mRNA for KIAA0781 protein, partial cds. 



•KIAA0781" 
0.0 



product: "KIAA0781 protein"; Homo sapiens 



( FUNCAT] 
8e-76 
[FUNCAT] 
( FUNCAT ] 
[FUNCAT] 
(FUNCAT] 
( FUNCAT ] 
3e-56 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
( FUNCAT ] 
[ FUNCAT] 
(FUNCAT) 
( FUNCAT J 
YPL031C] le-23 
( FUNCAT ] 
le-23 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

[S 

( FUNCAT] 
[FUNCAT] 
( FUNCAT] 
3e-19 
(FUNCAT] 
(FUNCAT] 
(FUNCAT] 
4e-18 
[FUNCAT] 
palraitylation, 
( FUNCAT] 
[FUNCAT] 
YNL183C) 2e-14 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YDR477w] 



11.01 stress response [S. cerevisiae, YDR477w] 8e-76 

30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 8e-76 

98 classification not yet clear-cut [S. cerevisiae, YCL024wJ 4e-58 
03.25 cytokinesis [S. cerevisiae, YDR507c] 3e-56 

03.04 budding, cell polarity and filament formation [S. cerevisiae, Y 

30.02 organization of plasma membrane [S. cerevisiae, YDRl22w] 
03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 3e-53 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 3e-53 

99 unclassified proteins [S. cerevisiae, YPL141c] 5e-51 

03.19 recombination and dna repair [S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 



3e-42 
3e-42 



11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPL153c) 3e-42 

03.01 cell growth [S. cerevisiae, YFR014c] 5e-42 

03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 2e-34 

03.10 sporulation and germination [S. cerevisiae, YGLlSOw] le-27 

08.13 vacuolar transport (S. cerevisiae, YGLlBOw] le-27 

06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGLlSOw] le-27 

10.02.11 key kinases [S. cerevisiae, YBLlOSc] 3e-26 

04.99 other transcription activities [S. cerevisiae, YER129w] 3e-26 

02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YPL031c} 



04.05.01.04 transcriptional control [S. cerevisiae, YPL03 lc] le-23 
03.13 meiosis [S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c] 8e-21 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YHL007c] 8e-21 

09.01 biogenesis of cell wall {S. cerevisiae, YPL140c] 2e-20 

10.03.11 key kinases [S. cerevisiae, YLR113w] 7e-20 

04.05.01.01 general transcription activities (S. cerevisiae, YDL108w) 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 2e-18 
10.04.11 key kinases IS. cerevisiae, YLR362w] 3e-18 

04.03.99 other trna-transcription activities IS. cerevisiae, YOR061w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) ' [S. cerevisiae, YFL033cJ 4e-17 

05.07 translational control {S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization (S. cerevisiae. 
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{ FUNCAT ] 08.99 other intracellular-transport activities IS. cerevisiae, YNL183C] 

2e-14 

( FUNCAT J 09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020c] 5e-14 

( FUNCAT] c energy conversion [M. genitalium, MG109] 2e-12 

[FUNCATJ 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YBR097wJ le-10 

(FUNCAT] 08.0"? vesicular transport (golgi network, etc.) [S. cerevisiae, YBR097w) 

le-10 

( FUNCAT] 30.08 organization of golgi [S. cerevisiae, YBR097w) le-10 

(FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 

le-10 

(FUNCAT) 10.04.99 other nutritional-response activities [S. cerevisiae, YJR059w) 

4e-09 

[ FUNCAT) 01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis {S. 
cerevisiae, YHR079c] le-07 

[FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

le-07 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YNLl54c] 2e-04 

[BLOCKS] BL00415A Synapsins proteins 

[BLOCKS] BL00239B Receptor tyrosine kinase class II proteins 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[SCOP] dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-78 

[SCOP] dlwfc S. 1.1.1.8 MAP kinase p38 (human (Homo sapiens) le-81 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 5e-89 

[SCOP] dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har 5e-86 

[SCOP] dlphk 5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 3e-80 

(SCOP) dlirk 5.1.1.2.4 insulin receptor [Human (Homo sapiens) 6e-70 

(SCOP) dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-95 

I SCOP] dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn 7e-71 

[SCOP] dlydse_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 2e-96 

[SCOP) dlfmk_3 S. 1.1. 2. 2 (168-437) c-src tyrosine kinase [human (Horn 2e-72 

[SCOP] dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 5e-97 

[SCOP1 d2hckb3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 2e-68 

[SCOP] dlcsn 5.1,1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 3e-53 

[SCOP] dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) 3e-7B 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CKl [rat (Rattus norvegicus) le-58 

[EC) 2.7.1.117 Myosin-light-chain kinase 3e-49 

[EC] 2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ) kinase 4e-78 

[EC] 2.7.1.38 Phosphorylase kinase 3e-41 

[EC] 2.7.1.37 Protein kinase 7e-45 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

[EC] 2.7.1.128 [Acetyl-CoA carboxylase] kinase 4e-78 

[PIRKW) phosphotransferase 3e-93 

[PIRKW] nucleus 2e-74 ' 

[PIRKW] calcium 2e-40 

[PIRKW) transferase 3e-33 

[PIRKWJ duplication 2e-32 

(PIRKW] tandem repeat 7e-45 

[PIRKW] phorbol ester binding 4e-33 

[PIR^]* zinc 4e-33 

( PIRKW] ion transport le-32 

(PIRKW] cell cycle control le-45 

(PIRKW] serine/threonine-specific protein kinase 2e-97 

[ PIRKW] oncogene le-34 

1 PIRKW] phospholipid binding 2e-32 

[PIRKW] autophosphorylation 2e-74 

[PIRKW} brain 6e-36 

[PIRKW] heterotetraraer 8e-38 

IPIRKW] mitosis le-45 

[PIRKW] polymer 5e-41 

[PIRKW] magnesium 6e-80 

[PIRKW] ATP 2e-97 

[PIRKW] polyprotein le-34 

[PIRKW] alternative initiators 2e-31 

[PIRKW] phosphoprotein 2e-74 

[PIRKW] apoptosis 8e-38 

[PIRKW] cGMP binding 4e-33 

[PIRKW] glycoprotein 3e-36 

[PIRKW] skeletal muscle 8e-38 

(PIRKW) protein kinase 2e-50 

[PIRKW] testis 5e-41 

[PIRKW] cAMP binding 8e-38 

[ PIRKW) transforming protein 4e-33 

[PIRKW] purine nucleotide binding 7e-52 

(PIRKWJ calcium binding 7e-45 

(PIRKW) alternative splicing 5e-42 

(PIRKW] P-loop 7e-52 

(PIRKW] lipoprotein Be-38 

(PIRKW] proto-oncogene 4e-33 

[PIRKW] segmentation le-34 

(PIRKW] core protein le-34 



608 



WO 01/12659 



PCT/IBOO/01496 



( PIRKW] 


muscle 8e-38 




( PIRKW] 


myristylation 8e-38 




IPIRKW] 


EF hand 7e-45 




[PIRKW] 


cell division 3e-49 




[PIRKW] 


homodimer le-32 




[PIRKW] 


calmodulin binding Se-42 




[SUPFAM] 


ribosomal protein S6 kinase II le-34 




(SUPFAM] 


calcium-dependent protein kinase 7e-45 




[SUPFAM J 


AMP-activated protein kinase 6e-80 




[SUPFAM] 


protein kinase akt 3e-36 




[SUPFAM] 


protein kinase SPKl 7e-41 




[SUPFAM) 


unassigned Ser/Thr or Tyr-specific protein kinases 8e- 


■99 


[SUPFAM] 


Ca2+/calmoduiin-dependent protein kinase 5e-42 




[SUPFAM] 


calmodulin repeat homology 7e-45 




[SUPFAM] 


cAMP receptor protein cyclic nucleotide-binding domain homology 3e-33 


[SUPFAM] 


protein kinase DUN1 6e-36 




[ SUPFAM] 


protein kinase C zeta 4e-33 




(SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic 


chain 2e-34 


[SUPFAM] 


death-associated protein kinase 8e-38 




[ SUPFAM) 


pleckstrin repeat homology 3e-36 




[SUPFAM] 


ankyrin repeat homology Be-38 




[SUPFAM] 


protein kinase homology 8e-99 




(SUPFAM] 


Ca2+/calmodulin-dependent protein kinase II 6e-38 




[SUPFAM] 


protein kinase C zinc-binding repeat homology 4e-33 




[ SUPFAM] 


protein kinase C delta 2e-32 




[SUPFAM] 


cGMP-dependent protein kinase 3e-33 




[SUPFAM] 


protein kinase cdrl le-45 




[SUPFAM] 


kinase-related transforming protein 2e-50 




[ SUPFAM) 


Ca2+/calmodulin-dependent protein kinase I 8e-42 




[SUPFAM] 


kinase interaction domain homology 7e-41 




[SUPFAM) 


gag-akt polyprotein le-34 




[PROSITE) 


PROTEIN KINASE ATP 1 




(PROSITE) 


MYRISTYL ~3 




[ PROSITE) 


AMI DATION 2 




[PROSITE) 


CAMP PHOSPHO SITE 4 




[PROSITE) 


CK2 PHOSPHO SITE 15 




[PROSITE) 


TYR PHOSPHO SITE 2 




[PROSITE) 


PKC PHOSPHO~SITE 10 




[ PROSITE) 


ASN GLYCOSYLATION 2 




[ PROSITE) 


PROTEIN_KINASE_ST 1 




[PFAM] 


Eukaryotic protein kinase domain 




(KW) 


Irregular 




[KWJ 


3D 




[KWJ 


LOW_COMPLEXITY 12.31 % 





SEQ MVMADG P RH LQRG P VRVG F YDI EGT LGKGNFA WKLGRH RI T KT EVA I K 1 1 DKSQLDAVN 

SEG 

IctpE EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 

SEQ LEKIYREVQIMKMLDHPHIIKLYQVMETKSMLYLVTEYAKNGEI FDYLANHGRLNESEAR 

SEG 

IctpE HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHH 

SEQ RK FWQILSAVDYCHGRKI VHRDLKAENLLLDNNMNIKIADFGFGNFFKSGELLATWCGSP 

SEG 

IctpE HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC-CCCCCG 

SEQ PYAAPEVFEGQQYEGPQLDI WSMGVVLYVLVCGALPFDGPTLPILRQRVLEGRFRI PYFM 

SEG 

IctpE GGCCHHHHHCCCBC-HHHHHHHHHHHHHHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTT 

SEQ SEDCEHLIRRMLVLDPSKRLTIAQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV . 

SEG 

IctpE CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG 

SEQ LRLMHSLGIDQQKTIESLQNKSYNHFAAI YFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

SEG 

IctpE 

SEQ A EQTVAKAQTVGL PVTMHS PNMRLL RS A L LPQAS N VEAFS FPASGCQA EAAFMEE ECVDT 

SEG 

IctpE 

SEQ PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 

SEG xxxxxxxxxxx 

IctpE 

SEQ RRHTLSEVTNQLVVMPGAGKIFSMWDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 

SEG 

IctpE 



609 



WO 01/12659 



PCT/IB00/01496 



,SEQ ANQPSPRMTSPFISLRPTNPAMQALS3QKREVHNRSPVSFREGRRASDTSLTQGIVAFRQ 

*SEG 

ICtpE 

SEQ HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 

SEG xxxxxxxxxxxxxxxx .... xxxxxxxxxxxx . 

IctpE 

SEQ LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 

SEG xxxxxxxxxxxxx 

IctpE 

SEQ RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 

SEG xxxxxxxxxxx xxxxxxxxxxxxxxx 

IctpE 

SEQ SSEQMQYSPFLSQYQEHQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

IctpE 

SEQ PLQFSYQTCELPSAASPAPDYPTPCQY PVDGAQQS DLTGPDCPRS PGLQEAP SSYDPLAL 

SEG xxx 

IctpE 

SEQ SELPGLFDCEMLDAVDPQHNGYVLVN 

SEG 



Prosite for DKFZphtes3_15kll . 1 



PS00QQ1 


115- 


■>119 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


320- 


>324 


ASH GLYCOSYLATION 


PDOC00001 


PS00004 


258- 


>262 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


355- 


■>359 


C AMP~PHOS PHO_S I TE 


PDOC00004 


PS00004 


481- 


•>485 


CAMP~PHOS PHC~SITE 


PDOC00004 


PS00004 


584- 


■>588 


CAMP~PHOSPHO SITE 


PDOC00004 


PS00005 


257- 


>260 


PKC PHOSPHO SITE 


PDOCO0005 


PS00005 


339- 


>342 


PKC PHOSPHO SITE 


PDOC00005 


PS0000S 


420- 


>423 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


475- 


>478 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


534- 


>537 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


545- 


>548 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


554- 


>557 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


567- 


>570 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


579- 


>582 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


670- 


>673 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


42 


:->46 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54 


->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


128- 


>132 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


359- 


>363 


CK2~*PHOSPHO SITE 


PDOC00006 


PSO0006 


394- 


>398 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


458- 


>462 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


484- 


>488 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


503- 


>507 


CK2~PH0SPH0"SITE 


PDOC00006 


PS00006 


515- 


>519 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


534- 


>538 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2~PHOSPHO SITE 


POOC00006 


PS00006 


878- 


>882 


CK2 PHOSPHO SITE 


POOC0000 6 


PS00006 


893- 


>897 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


672- 


>680 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


100- 


>108 


TYR PHOSPHO SITE 


PDOC00007 


PSOOOOB 


372- 


>378 


MYRISTYL 


PDOC00008 


PSOOOOB 


871- 


>877 


MYRISTYL 


PDOC00008 


PSOOOOB 


905- 


>911 


MYRISTYL 


PDOC00008 


PS00009 


134- 


>138 


AMI DAT I ON 


PDOC00009 


PS00009 


582- 


>586 


AMI DAT I ON 


PDOC00009 


PS00107 


26 


->50 


PROTEIN KINASE ATP 


PDOC00100 


PS00108 


138- 


>151 


PROTEIN~KINASE ST 


PDOC00100 
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HMM_NAME Eukaryotic protein kinase domain 
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HMM *YeigRiIGeGsFGtV¥kCiWr.TGeIVAIKII)ckrsms FlREI 

Y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAVVKLGRHRITKTEVAIKIIDKSQLDAVNLEKIYREV 

HMM qlMRrLnHPNIIRFYDwFedddDHIYMIMEYMeGGDLFDYIr rngpMaEw 

QIM++L+HP+II++Y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 
Query 69 QIMKMLDHPHIIKLYQVME-TKSMLYLVTEYAKNGEIFDYLANHGRLNES 

HMM elr flMyQILrGMeYLHSMgllHRDLKPENILIDeNgqlKIcDFGLARqM 

E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFF 

HMM nnYerMt t f CGTPWYMMAPEVI Img . nyYttkVDMWSFGCILWEMMTGep 

+++E++ x CG+P+Y APEV +G +Y +++ D+WS+G++L+ +++G + 
Query 168 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGWLYVLVCGAL 

HMM pFyddnMemlmrliqrfrrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
pp++ ++ + + +++ R+++++ +SE++ +++R+++ +DP+KR+T+ QI 
Query 216 PFDGPTLPILRQRVLEGRFRIPYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 



PCTYIB00/01496 

68 

in 

167 
215 
265 
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DKFZphtes3_17fl0 
group: testes derived 

DKFZphtes3_15jl8 encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to neurofilament proteins 

Sequenced by GBF * 

Locus : unknown 

Insert length: 2533 bp 

Poly A stretch at pos. 2507, no polyadenylation signal found 

1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 
51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 

101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 

151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 

201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 

251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 

301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 

351 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 

401 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 

451 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 

501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 

551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 

601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 

651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 

701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 

751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 

801 GCACAAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 

951 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 

901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 

951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 
1001 TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 
1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 
1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 
1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 
1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 
1251 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 
1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 
1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 
1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 
1451 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 
1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 
1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 
1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 
1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 
1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 
1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 
1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 
1851 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 
1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 
1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 
2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 
2051 CCTTAAAGAT TTGTCTAATA CCAATGATGG ACAGGCTCCC ACTCTTGAAA 
2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 
2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 
2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 
2251 ATTAAAGGGC ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 
2301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 
2351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 
2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 
2451 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 
2501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 18 bp to 2147 bp; peptide length: 710 
Category: similarity to known protein 
Classification: unclassified 



1 MDRSQQTSRT GYWTMMNIPP 
51 KEDALKHKSS GKIFASEHPE 
101 VLLEDELREE VTVPVVQEGS 
151 TAKAEPRPAE ETHVQVQPST 
201 EFPAEIQPPS AEESPSVELL 
251 VELLGEIRSP SAQKAPIEVQ 
301 LPEEAPREEA RELQLSTAME 
351 AATEPPADET PAEARSPLSE 
401 LAAIEAPADE TPAEAQSPLS 
451 PPSGEETTAE EASAAIQLLA 
501 EEAPAEVQPP PAEEAPAEVQ 
551 APSEVQPPPA EEAPAEVQSL 
601 KHSPPADLLL TEEFPIGEAS 
651 AGIPAVKLGS VVLEGEAKFE 
701 IELKQRPPEL 



VEKVDKEQQT YFSESEIVVI SRPDSSSTKS 
FQPATNSNEE IGQKNISRTS FTQETKKGPP 
AVKKVASAEI EPPSTEKFPA KIQPPLVEEA 
EETPDAEAAT AVAENSVKVQ PPPAEEAPLV 
AEILPPSAEE SPSEEPPAEI LPPPAEKSPS 
PLPAEGALEE APAKVEPPTV EETLAEVQPL 
TPAEEAPTEF QSPLPKETTA EEASAEIQLL 
ETSAEEAHAE VQSPLAEETT AEEASAEIQL 
EETSAEEAPA EVQSPSAKGV SIEEAPLELQ 
ATEASAEEAP AEVQPPPAEE APAEVQPPPA 
PPPAEEAPAE VQPPPAEEAP AEVQPPPAEE 
PAEETPIEET LAAVHSPPAD DVPAEEASVD 
AEVSPPPSEQ TPEDEALVEN VSTEFQSPQV 
EVSKINSVLK DLSNTNDGQA PTLEIESVFH 



BLAST P hits 



No BLAST P hits available 



Alert B LAS TP hits for DKFZphtes3_17f 10, frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N - 1, Score - 480, P 
■= 7.4e-43 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
- 1, Score - 475, P - le-42 



>PIR:A37221 neurofilament triplet H protein - rat 
Length - 1,072 



HSPS: 



Score - 480 (72.0 bits), Expect - 7,4e-43, P - 7.4e-43 
Identities - .185/622 (29*). Positives - 320/622 (51%) 



Query: 


33 


Sbjct: 


436 


Query: 


93 


Sbjct: 


496 


Query: 


153 


Sbjct: 


555 


Query: 


212 


Sbjct: 


610 


Query: 


269 


Sbjct: 


670 


Query: 


328 


Sbjct: 


722 


Query: 


384 



SE +1 V+ + + 



E G + + TS 



+ A K + AE + P+ K PA+++ P ++ A 



V+ P+T ++P + A A++ +V+ P ++P + PAE + P+ 
- VK - S PATVKS P AEAKS PAEAKS PAEVKS PAT VKS PGEAKSP AEAKS P AE 609 



P++ +SP E + PAE 



++P +V+ P ++ +E + 



KSP+ V+ E +SP+ K+P+ 



++PAE ++P 
-PVVAKS PAEAKS P 721 



+ PA+ ++PAEA+SP+ E S E+A + V+ 



384 PLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPLSEET-SAEEAPA- EVQSPSAKGV 440 
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LAE + E+A + ++ 1+ PA+ ++P +A+SP+ EE S E+A +V+SP AK 
Sbjct: 776 SLAEAKSPEKAKSPVK--EEIKPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTP 833 



Query: 


441 


SIEEA--PLELQPPSGEETTA-EEASAAIQLLAATEASA— EEAPAEVQPPPAEEAPAE 


494 




+ EEA P +++ P ++ A EEA + + TE A EE + V+ A+E P + 




Sbjct: 


834 


AKEEAKRPADIRSPEQVKSPAKEEAKSPEKEETRTEKVAPKKEEVKSPVEEVKAKEPPKK 


893 


Query: 


495 


VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 


553 




V+ P EV+ +EAP E Q P AEE + P +++P E + EEA 




Sbjct: 


894 


VEEEKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP— KDSPGEAKK— EEAKE 


948 


Query: 


554 


EVQP P PAEEAPAE V QSLP— AEETPIEETL— AAVHSPPADDVPAEEASVD-KHS 


603 




+ P EE PA++ ' ++ P AE+ +E + P ++VPA D K 




Sbjct: 


94 9 


KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 


1008 


Query: 


604 


PPADLLLTEEFPIGEASAEVSPP — PSEQT-PEDEALVENVSTEFQSPQ 649 






+ EE P +A A+ P E + P+ E ++ ST + + Q 




Sbjct: 


1009 


KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDQKDSQ 1057 




Score 


- 473 


(71.0 bits), Expect - 4.8e-42, P - 4.8e-42 




Identities - 184/628 (29%), Positives - 310/628 (49%) 




Query: 


18 


IPPVEKVDKEQQTYFSESEIVVISRP DSSSTKSKEDALKHKSSGKIFASEHPEFQPA 


74 




I VEK +KE ++E + ++ + E+ + + G+ A+ P + A 




Sbjct: 


440 


IKVVEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 


499 


Query: 


75 


TNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGSAVKKVASAEIEPPS 


134 




+ +E + + + + KP E+ E P + AK + AE+P+ 




Sbjct: 


500 


ASPEKET-KSPVKEEAKSPAEAKSPA EAKS PAEAKSPAE VKS PAEVK- S P AEAKS PA 


554 


Query: 


135 


TEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQ-PSTEETPDAEAATAVAENSVKVQPPP 


193 






K PA+++ P ++ A+A+ ++ +V+ P+T ++P + A A++ +V+ P 




Sbjct: 


555 


EAK S PAEVKS P AT VKSPAEAKS PAEAKSPAE VKS P AT VKSPG EAKS PAEAKSPAEVKSPV 


614 


Query: 


194 


AEEAPL-VEFPAEIQPPSAEESPS-VELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS 


250 




++P + PA ++ P +SP+ + AE+ P+ +SP E + PAE+ P KSP+ 




Sbjct: 


615 


EAKS PAEAKS PAS VKS PGEAKSP AEAKS PAEVK SPAT VKS PVEAKS PAEVKS PVT VKS PA 


674 


Query: 


251 


-VELLGEIRSPSAQKAPIEVQ-PLPAEGALE-EAPAKVEPPTVEETLAEVQPLLPEEAPR 


307 




+ E++SP++ K+P E + P A+ E + + P + P + + AE +P ++P 




Sbjct: 


675 


EAKS PVEVKS PASVKS PS EAKS PAGAKS PAEAKS PVVAKS PAEAKS PAEAKP PAEAKS PA 


734 


Query: 


308 


EEARELQLSTAME--TPAE-EAPTEFQSP LP-KE TTAEEASAEIQLLAATE-- 


354 




E + + E +PAE ++P E +SP P KE + AE S E E 




Sbjct: 


735 


EAKS PAEAKS PAEAKS PAEAKS PVEVKS PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEE I 


794 


Query: 


355 


-PPAD-ETPAEARSPLSEET-SAEEAHA-EVQSPLAEETTAEEAS — AEIQLLAAIEAPA 


408 




PPA+ ++P +A+SP+ EE S E+A +V+SP A+ EEA A+I+ + ++PA 




Sbjct: 


795 


KPPAEVKSPEKAKSPMKEEAKSPEKAKTLOVKSPEAKTPAKEEAKRPADIRSPEQVKSPA 


854 


Query: 


409 


DETPAEAQSPLSEETSAEE-APA--EVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAA 


465 




E EA+SP EET E+ AP EV+SP +EE + +PP E EE + A 




Sbjct: 


855 


KE EAKSPEKEETRTEKVAPKKEEVKSP VEEVKAK-EPPKKVE EEKTPA 


901 


Query: 


466 


IQLLAAT EASAE EAPAEVQP P PAE EAPAEVQPP P AEEA P A EVQ P P PAEEA PAEVQP P PAE 


525 






E+ +EAP E Q P AEE + P +++P E + A+E A P E 




Sbjct: 


902 


TPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP--KDSPGEAKKEEAKEKKAAA PEE 


956 


Query: 


526 


EAPAEV QPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETL 


581 






E PA++ + P EfA P++ PSE + P EE PA + +E E+ 




Sbjct: 


957 


ETPAKLGVKEEAKPKEKAEDAKAKEPSK— PSEKEKPKKEEVPAAPEKKDTKEEKTTESK 


1014 


Query: 


582 


AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 636 






P EE DK P TE+ ++ + PSE+ PED+A 




Sbjct: 


1015 


KPEEKPKMQAKAKEE DKGLPQEPSKPKTEKAEKSSSTDQKDSQPSEKAPEDKA 1067 


Score 


- 421 


(63.2 bits), Expect - 3.7e-36, P - 3.7e-36 





Identities - 162/540 (30%), Positives - 275/540 (50%) 

Query: 135 TEKFPAKIQPPLVEEATAKAEPR PAEETHVQVQPSTEETPDAEAATAVAENSVKV 189 

TE P KI P + K+E + +E+ V V+ TEE E T E + 

Sbjct: 419 TEGLP-KI-PSMSTH1KVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEVTE— EEDKEA 474 

Query: 190 QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE--SPSE-EPPAEILPPPAE 24 6 

Q EEA A P AEE+ S E E P EE SP+E + PAE P 

Sbjct: 475 QGEEEEEAEEGGEEAATTSPPAEEAASPE— KETKSPVKEEAKS PAEAKS PAEAKS PAEA 532 

Query: 247 KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

KSP+ E++SP+ K+P E + PAE ++PA+V+ P ++ AE + ++P 
Sbjct: 533 KSPA EVKS PAEVKS PAEAKS- PAEA KS PAEVKS PATVKS PAEAKS PAEAKS P 583 
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Query: 307 REEARELQLSTAME— TPAE-RAPTEFQSPLPKETTAEEAS-AEIQLLAATEPPAD-ETP 361 

E + + E +PAE ++P E +SP+ ++ AE S A ++ + PA+ ++P 

Sbjct: 584 AEVKSPATVKSPGEAKSPAEAKSPAEVKSPVEAKSPAEAKSPASVKSPGEAK5PAEAKSP 643 

Query: 362 AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPL 419 

AE +SP + ++ E ++ AEV+SP+ ++ AE A + ++ +++PA ++P+EA+SP 
Sbjct: 644 AEVKSPATVKSPVEAKSPAEVKSPVTVKSPAE-AKSPVE VKSPASVKSPSEAKSP- 697 

Query: 420 SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 478 

+ + + PAE +SP AK + ++P E +PP+ ++ AE S A A + A A+ 
SbjCt: 698 :AGAKSPAEAKSPVVAKSPAEAKS PAEAKPPAEAKSPAEAKSPAE AKSPAEAK- 749 

Query: ' 479 APAEVQPPPAEEAPAEVQPPPAEEAP--AEVQPPPAEEAPA--EVQPPPAEEAPAEVQPP 534 

+PAE + P ++P ++P EA AE+P ++P E++PP ++P + + P 
Sbjct: 750 SPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 809 

Query: 535 PAEEAPAEVQPPPAEEAPSEVQPPPAEEA — PAEVQSLPAEETPIEETLAAVHSPPADDV 592 

EEA + + + E t P EEA PA+++S ++P +E SP ++ 

Sbjct: 810 MKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKEE AKSPEKEET 666 

Query: S93 PAEEASVDKHS— PPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQV 650 

E++K P + ++EP + E P + +T E++ EQP+ 

Sbjct: 867 RTEKVAPKKEEVKSPVEEVKAKEPP— KKVEEEKTPATPKTEVKESKKOEAPKEAQKPKA 924 

Query: 651 AGI PAVKLGS VVLEGEAKFEEVSK 674 

+ GEAR EE + 

Sbjct: 925 EEKEPLTEKPKDSPGEAKKEEAKE 948 

Score - 406 (60.9 bits), Expect » X.7e-34, P - 1.7e-34 
Identities - 123/390 (31%), Positives - 213/390 (54%) 

Query: 308 EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA— EA 364 

E+ E+Q++ E EE E Q +E AEE E AT PPA+E + E 
SbjCt: 455 EQTEEIQVT EEVTEEEDKEAQGE- -EEEEAEEGGEEA ATTSPPAEEAASPEKET 506 

Query: 365 RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 
Sbjct: 507 KSPVKEEAKSP AEAKSPAEAKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKS PAEVK 563 

Query: 423 TSAE-EAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 480 

+ A ++PAE +SP+ AK + ++P ++ P GE + EA + ++ + EA ++P 
SbjCt: 564 SPATVKSPAEAKSPAEAKSPAEVKSPATVKSP-GEAKSPAEAKSPAEVKSPVEA KSP 619 

Query: 481 AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 540 

AE + P + ++P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 
Sbjct: 620 AEAKSPASVKSPGEAKS PAEAKS PAEVKS PATVKSPVEAKS PAEVKS PVTVKS PAEAKSP 679 

Query: 541 AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPAD-DVPAEEASV 599 

EV+ P + ++PSE + P ++PAE +S ++P E A PPA+ PAE S 
Sbjct: 680 VEVKS PAS VKSPSEAKSPAGAKS PAEAKS PVVAKSPAEAKSPAEAKPPAEAKS PAEAKSP 739 

Query: 600 DKHS PPADLLLTEEFP I GEAS AEVS P P PS EQT PE DEAL V ENVS TEFQS PQVAG I PAVKLG 659 

+ PA+ E ++ EV P + + P E ++++ E +SP+ A P VK 

SbjCt: 740 AEAKSPAEAKSPAE— AKSPVEVKSPEKAKSPVKEG-AKSLA-EAKSPEKAKSP-VK-E 792 

Query: 660 SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 

+ E K E +K S +K+ + + + +A TL+++S 
Sbjct: 793 EIKPPAEVKSPEKAK— SPMKEEAKSPE-KAKTLDVKS 827 

Score - 255 (38.3 bits), Expect - 5.5e-18, P = 5.5e-18 
Identities - 124/420 (29%), Positives - 199/420 (47%) 

Query: 252 ELLGEI RS PSAQKAPI EVQPL PA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

ELLG+I+ A +A + + A AL E A++E TV+ TL + 

Sbjct: 236 ELLGQIQGCGAAQAQAQAEARDALKCDVTSALREIRAQLEGHTVQSTLQSEEWFRVRLDR 295 

Query: 307 REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 366 

EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 

Sbjct: 296 LSEAAKVN-TDAMRSAQEEI-TEYRRQLQARTT ELEALKSTKESLERQRSELED 347 

Query: 367 PLSEE-TSAEEAHAEVQSPLAEETTAEEASA--EIQLLAAIEAPAD-ETPAEAQSPLSEE 422 

+ S ++A ++ + L TEA+ EQL++ DEA + EE 
Sbjct: 348 RHQVDMASYQDAIQQLDNEL-THTKWEMAAQLREYQDLLNVKMALDIEI AAYRKLLEGEE 406 

Query: 423 TSAEEAPAEV QSPS-AKGVS1E-EAPLELQPPSGEETT-AEEASAAIQLLA-A 471 

p+ + PS + + ++ E +++ S +ET EE + IQ+ 

Sbjct: 407 CRIGFGPSPFSLTEGLPKIPSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEV 466 

Query: 472 TEASAEEAPAEVQPPPAEEAPAEVQP— PPAEEAPA— -EVQPPPAEEA— PAEVQPPPA 524 

TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 

Sbjct: 4 67 TEEEDKEAQGE-EEEEAEEGGEEAATTSPPAEEAASPEKETKSPVKEEAKS PAEAKS PAE 525 
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Query: 525 EEA PAEVQPP P A EEA PA E VQ P P PA E EAPS EVQ P PPAE EAPAE VQSL PAE- ET P I E -ET LA 562 

+ + PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 
Sbjct: 526 AKSPAEAKSPAEVKS PAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 584 

Query: 583 AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPPSEQTP-EDEALVENV 641 

V SP PES + PA++ E + + AE PS ++P E ++ E 
Sbjct: 585 EVKSPATVKSPGEAKSPAEAKSPAEVKS PVE AKSPAEAKSPASVKSPGEAKSPAEAK 641 

Query: 642 S-TEFQSPQVAGI P 654 

S E +SP P 
Sbjct: 642 SPAEVKSPATVKSP 655 

Score - 253 (38.0 bits). Expect - 9.0e-18, P - 9.0e-18 
Identities - 115/364 (31%) , Positives - 166/364 (45%) 

Query: 110 EVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAE-ETHVQVQ- 167 

E PVV + A K + AE +PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 
Sbjct: 70S EAKSPVVAKSPAEAK-SPAEAKPPAEAKSPAEAKSPAEAKSPAEAKS-PAEAKSPVEVKS 7 62 

Query: 168 PSTEETPDAEAATAVAE— NSVKVQPPPAEEA— PL-VEFPAEIQPPSAEE— SPSVELL 220 

P ++P E A ++AE + K + P EE P V+P + + P EE SP 
Sbjct: 763 PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSPMKEEAKS PEKAKT 822 

Query: 221 AEILPPSAEESPSEEP— PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE— 275 

+ + P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
Sbjct: 823 LDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKE— EAKSPEKEETRTEKVAPKKEEVK 879 

Query: 276 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+EE AK P VEE E P P+ +E ++ A + AEE P TE 

Sbjct: 880 SPVEEVKAKEPPKKVEE EKTPATPKTEVKESKKOEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE--ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

Sbjct: 937 SPGEAKKEEAKEK KAAA — PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 A-EEASAEIQLLAAIEAPADETPAEAQSPLSEETSAEEAPAEVQSPSA-KGVSIEEAPLE 448 

EE A + E E+ + p + + EE Q PS K E++ 

Sbjct: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 449 LQPPSGEETTAEEASAA 465 

Q S A E AA 

Sbjct: 1052 DQKDSQPSEKAPEDKAA 1068 

Pedant information f or *DKFZphtes3_17f 10, frame 3 



Report for DKFZphtes3_17f 10.3 

[LENGTH] 710 

[MW] 75131.94 

[pi] 4.02 

[KW] All Alpha 

[KW] LOW~COMPLEXITY 34.08 % 

SEQ MDRSQQTSRTGyWTMMNIPPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRO cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKIFASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPWQEGS 

SEG 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

SEG xxxxxxxxxxx 

PRD chhhhhhhccccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG XXXX xxxxxxxxxxxxxxxxxxxx 

PRO hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccceccchhhhhhhhhhc 

SEQ LPEEAPREEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADET 

SEG xxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAEEAHAEVQSPLAEETTAEEASAEIQLLAAIEAPADETPAEAQSPLS 

SEG XXXX .... XXXXXXXXXXXX XXXXXXXXXXXX xxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx XXXXXXXX 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDV PAEEASVD 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIESVFHIELKQRPPEL 

SEG 

PRD eeechhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_17f 10 .3) 
(No Pfam data available for DKFZphtes3_17f 10 . 3) 
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DKFZphtes3_17117 



group: metabolism 

DKFZphtes3 17117 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1) . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) . It is a new testis- 
specific transketolase. Transketolase requires thiamin pyrophosphate as cofactor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO (2) and R- 
CHOH-CO-CH(2)OH. 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 



strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 



Sequenced by GBF 
Locus: unknown 



Insert length: 2688 bp 

Poly A stretch at pos. 2649, polyadenylation signal at pos . 2630 



1 GACAAAAGAG AGATGATGGC CAACGACGCC AAGCCCGACG TGAAGACCGT 

51 GCAGGTGCTG CGGGACACAG CCAACCGCCT GCGGATCCAT TCCATCAGGG 

101 CCACGTGTGC CTCTGGTTCT GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 

151 GAGGTCGTGT CTGTCCTCTT CTTCCACACG ATGAAGTATA AACAGACAGA 

201 CCCAGAACAC CCGGACAACG ACCGGTTCAT CCTCTCCAGG GGACATGCTG 

251 CTCCTATCCT CTATGCTGCT TGGGTGGAGG TGGGTGACAT CAGTGAATCT 

301 GACTTGCTGA ACCTGAGGAA ACTTCACAGC GACTTGGAGA GACACCCTAC 

351 CCCGCGATTG CCGTTTGTTG ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 

401 TAGGTACTGC ATGTGGAATG GCTTATACTG GCAAGTACCT TGACAAGGCC 

4 51 AGCTACCGGG TGTTCTGCCT TATGGGAGAT GGCGAATCCT CAGAAGGCTC 

501 TGTGTGGGAG GCTTTTGCTT TTGCCTCCCA CTACAACTTG GACAATCTCG 

551 TGGCGGTCTT CGACGTGAAC CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 

601 GAGCATGGCG CAGACATCTA CCAGAATTGC TGTGAAGCCT TTGGATGGAA 

651 TACTTACTTA GTGGATGGCC ATGATGTGGA GGCCTTGTGC CAAGCATTTT 

701 GGCAAGCAAG TCAAGTGAAG AACAAGCCTA CTGCTATAGT TGCCAAGACC 

751 TTCAAAGGTC GGGGTATTCC AAATATTGAG GATGCAGAAA ATTGGCATGG 

801 AAAGCCAGTG CCAAAAGAAA GAGCAGATGC AATTGTCAAA TTAATTGAGA 

851 GTCAGATACA GACCAATGAG AATCTCATAC CAAAATCGCC TGTGGAAGAC 

901 TCACCTCAAA TAAGCATCAC AGATATAAAA ATGACCTCCC CACCTGCTTA 

951 CAAAGTTGGT GACAAGATAG CTACTCAGAA AACATATGGT TTGGCTCTGG 

1001 CTAAACTGGG CCGTGCAAAT GAAAGAGTTA TTGTTCTGAG TGGTGACACG 

1051 ATGAACTCCA CCTTTTCTGA GATATTCAGG AAAGAACACC CTGAGCGTTT 

1101 CATAGAGTGT ATTATTGCTG AACAAAACAT GGTAAGTGTG GCACTAGGCT 

1151 GTGCTACACG TGGTCGAACC ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 

1201 TTTACTAGAG CATTCGATCA GCTCCGAATG GGAGCCATTT CTCAAGCCAA 

1251 TATCAACCTT ATTGGTTCCC ACTGTGGGGT ATCCACTGGA GAAGATGGAG 

1301 TCTCCCAGAT GGCCCTGGAG GATCTAGCCA TGTTCCGAAG CATTCCCAAT 

1351 TGTACTGTTT TCTATCCAAG TGATGCCATC TCGACAGAGC ATGCTATTTA 

1401 TCTAGCCGCC AATACCAAGG GAATGTGCTT CATTCGAACC AGCCAACCAG 

1451 AAACTGCAGT TATTTATACC CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 

1501 AAGGTGGTCC GCCACGGTGT CAATGATAAA GTCACAGTAA TTGGAGCTGG 

1551 AGTTACTCTC CATGAAGCCT TAGAAGCTGC TGACCATCTT TCTCAACAAG 

1601 GTATTTCTGT CCGTGTCATC GACCCATTTA CCATTAAACC CCTGGATGCC 

1651 GCCACCATCA TCTCCAGTGC AAAAGCCACA GGCGGCCGAG TTATCACAGT 

1701 GGAGGATCAC TACAGGGAAG GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 

1751 TCTCCAGGGA GCCTGATATC CTTGTTCATC AACTGGCAGT GTCAGGAGTG 

1801 CCTCAACGTG GGAAAACTAG TGAATTGCTG GATATGTTTG GAATCAGTAC 

1851 CAGACACATT ATAGCAGCCG TAACACTTAC TTTAATGAAG TAAACTAGGC 

1901 TTATTTCTAA AAAGTCAAGT CTATTGGCTT TGGCCCAAAA GCACTGGTAT 

1951 CTTTGTATTA AATTCATGTT TATTGTCACA AAACCATTAT TTATACCTAT 

2001 ACAGTTGTAC TGTTTCTTTT AAAGCAAAGC CATTTAACAT CTTTCTTCAT 

2051 TCCTAATTTG GAAATTAAAG TTTACCTTTC TGTTAATCTA TGTATAAATG 

2101 TTACTCTGAG TTATTAATGT GGATTTTAAA ATTGTAAGCA ATAGAATAGG 

2151 AAATAAAACA ACTACCTAAT ACAAATATTT CTGATAAGAC TACAAATATC 

2201 TGACTGAGCT GGGGATTAAA GTAGAGGTAA CTGTATCTTA AATGAGTATG 

2251 ATTTCCTTGT AAGTTAAAAA AATTGAAATT TAATTGTAGA CTTCAATAGT 

2301 CCAAGTTTTG AAGGATGTTT GAGCTTTTGT ATAATGCCAT TTATACCTGC 

2351 AGTTTTACAG ATAATGTTTG ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 

2401 TTTGCCTTCA TCTCTCCTCT ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 

2451 ACATCTCTTG ATGCACCACA CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2501 TAACTGGTTC TAGTTTGCAC ACT AC AC AC A TAGTTTTGTG AAGCTTCAGA 

2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 

2601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 

2651 GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

Yl mouse adrenocortical tumor cells. 



99123875: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase. 



Peptide information for frame 1 



ORF from 13 bp to 1890 bp; peptide length: 626 
Category: strong similarity to known protein 
Classification: Metabolism 
Prosite motifs: AT P_GT P A (595-603) 



1 MMANDAKPDV KTVQVLRDTA 
51 VLFFHTMKYK QTDPEHPDND 
101 LRKLHSDLER HPTPRLPFVD 
151 FCLMGDGESS EGSVWEAFAF 
201 DIYQNCCEAF GWNTYLVDGH 
251 GIPNIEDAEN WKGKPVPKER 
301 SITDIKMTSP PAYKVGDKIA 
351 FSEIFRKEHP ERFIECI IAE 
401 FDQLRMGAIS QANINLIGSH 
451 YPSDAISTEH AIYLAANTKG 
501 HGVNDKVTVI GAGVTLHEAL 
551 SSAKATGGRV ITVEDHYREG 
601 KTSELLDMFG ISTRHIIAAV 



NRLRIHSIRA TCASGSGQLT SCCSAAEVVS 
RFILSRGHAA PILYAAWVEV GDISESDLLN 
VATGSLGQGL GTACGMAYTG KYLDKASYRV 
ASHYNLDNLV AVFDVNRLGQ SGPAPLEHGA 
DVEALCQAFW QASQVKHKPT AIVAKTFKGR 
ADAIVKLIES QIQTNENLIP KSPVEDSPQI 
TQKTYGLALA KLGRANERVI VLSGDTMNST 
QNMVSVALGC ATRGRTIAFA GAFAAFFTRA 
CGVSTGEDGV SQMALEDLAM FRSIPNCTVF 
MCFIRTSQPE TAVIYTPQEN FEIGQAKVVR 
EAADHLSQQG ISVRVIDPFT IKPLDAATII 
GIGEAVCAAV SREPDILVHQ LAVSGVPQRG 
TLTLMK 



BLAST P hits 



No BLAST P hits available 



Alert BLAST P hits for DKFZphtes3_17117, frame 1 

SWISSPROT:TKT MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68) . , N = 1, 
Score - 2222, ~P = 2.5e-230 

SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK) . , N » 1, Score - 
2202, P - 3.3e-228 

TREMBL:RN092S6_1 product: "transketolase"; Rattus norvegicus 
Sprague-Dawley~transketolase mRNA, complete cds., N - 1, Score » 2202, 
P - 3.3e-228 

SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK) . , N - 1, Score - 
2200, P - 5.3e-228 



>SWISSPROT:TKT_M0USE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68) . 
Length - 623 



HSPs: 



Score - 2222 (333.4 bits), Expect - 2.5e-230, P - 2.5e-230 
Identities = 417/614 (67*), Positives - 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEH 66 
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Sbjct 



6 



KPD + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 
KPDCjQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRN 



65 



Query: 67 PDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVDVATGSL 126 

P NDRF+LS+GHAAPILYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 
Sbjct: 66 PHNDRFVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVpKQAFTDVATGSL 125 

Query: 127 GQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFAFASHYNLDNLVAVFDVN 186 

GQGLG ACGMAYTGKY DKASYRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 
Sbjct: 126 GQGLGAACGMA Y TG K Y FDKAS YRV YCM LGDG EVS EGS VWEAMAFAG I Y KL DN LVA I FDI N 185 

Query: 187 RLGQSGPAPLEHGADIYQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPTAIVAKT 24 6 

RLGQS PAPL+H DIYQ CEAFGW+T +VDGH VE LC+AF QA K++PTAI+AKT 
Sbjct: 166 RLGQS D PA P LQH QV D I YQKRC EA FGWHT 1 1 VDGHS VEELC KA FGQA KHQPTAIIAKT 242 

Query: 247 FKGRGIPNIEDAENWHGKPVPKERADAIVKL1ESQIQTNENLIPKS PVEDSPQZSITDIK 306 

FKGRGI I ED E WHGKP+PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1+ 
Sbjct: 243 FKGRGITGIEDKEAWHGKPLPKNMAEQIIQEI YSQVQSKKKILATPPQEDAPSVDIANIR 302 

Query: 307 MTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEI FRKEHPERFIEC 366 

M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GOT NSTFSE+F+KEHP+RFIEC 
Sbjct: 303 MPTPPSYKVGDKIATRKAYGLALAKLGHASDRIIALDGDTKNSTFSELFKKEHPDRFIEC 362 

Query: 367 IIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRHGAISQANINLIGSHCGVSTG 426 

IAEQNMVS+A+GCATR RT+ F FAAFFTRAFDQ+RM AIS++NINL GSHCGVS G 
Sbjct: 363 YI A EQNMV S I AVGCAT RDRT V P FC S T FAA FFTRAFDQ I RMAA I S ES N I NLCGSHCGVS I G 422 

Query: 427 EDG VSQMALE DLAMFRS I PNCTV FY PS DA I ST EHA I YLAANTKGMC F I RTSQ PETAVI YT 486 

EDG SQMALEDLAMFRS+P TV FY PSD ++TE A+ LAANTKG+CFIRTS+PE A+IY+ 
Sbjct: 423 EDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAI I YS 482 

Query: 487 PQENFE I GQAK WRHGVN DKVT V I GAGVTLHEALEAADHLSQQG I S VRV IDPFTIKPLDA 546 

E+F++GQAKVV + D+ VT V I GAGVTLH EAL AA+ L + IS+RV+DPFTIKPLD 
Sbjct: 483 NNEDFQVGQAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISIRVLDPFTIKPLDR 542 

Query: 547 ATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRGKTSELL 606 

1+ SA+AT GR++TVEDHY EGGIGEAV AAV EP + V +LAVS VP+ GK +ELL 
Sbjct: 543 KLILDSARATKGRILTVEDHYYEGGIGEAVSAAVVGEPGVTVTRLAVSQVPRSGKPAELL 602 

Query: 607 DMFGISTRHI IAAV 620 

MFGI 1+ AV 
Sbjct: 603 KMFG I DKDA I VQAV 616 



Pedant information for DKFZphtes3_17117, frame 



Report for DKFZphtes3_17117.1 



[KOMOL] 
| FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
2e-05 



[LENGTH] 
[MW] 



[pi] 



626 

67877.52 
5.90 

SWISSPROT:TKT_MOUSE T RAN S KETOLAS E (EC 2.2.1.1] (TK) (P68). 0.0 

m outer membrane and cell wall [M. jannaschii, MJ0681] 3e-48 

g carbohydrate metabolism and transport [H. influenzae, HI1023} 9e-36 

01.05,01 carbohydrate utilization [S. cerevisiae, YPR074cJ 5e-32 

30.03 organization of cytoplasm (S. cerevisiae, YPR074cJ 5e-32 

02.07 pentose-phosphate pathway [S. cerevisiae, YPR074c) 5e-32 

01.01.01 amino-acid biosynthesis (S. cerevisiae, YPR074c] 5e-32 

i lipid metabolism [H . influenzae, HI1439) 3e-17 

c energy conversion [H. influenzae, HI1233] 2e-09 

02.01 glycolysis [S. cerevisiae, YBR221C PDB1 - pyruvate dehydrogenase] 



[FUNCAT] 
dehydrogenase] 



[PXRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
(BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[ SCOP) 



[EC] 
[EC) 
[EC) 
[EC) 



30.16 mitochondrial organization [S. cerevisiae, YBR221c PDB1 - pyruvate 

2e-05 

BL00801F 

BL00801E 

BL00801D Transketolase proteins 
BL00801C Transketolase proteins 
BL00801B Transketolase proteins 
BL00801A Transketolase proteins 

dltrka2 3.28.1.2.1 Transketolase Transketolase, C-terminal domai le-21 

1.2.4.1 Pyruvate dehydrogenase (lipoamide) 8e-ll 

1.2.4.4 3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 

2.2.1.1 Transketolase 0.0 

2.2.1.3 Formaldehyde transketolase le-20 

transferase 0.0 

flavoprotein 2e-07 

Calvin cycle le-40 

heterotetramer 2e-07 
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[PIRKW] pentose phosphate pathway 0.0 

IPIRKW] magnesium le-40 

JPIRKW] thiamine pyrophosphate 0.0 

[PIRKW] oxidoreductase 7e-12 

( PIRKW] fatty acid biosynthesis 4e-10 

(PIRKW) mitochondrion 2e-07 

[PIRKWJ peroxisome le-20 

[PIRKW) homodimer le-40 

[SUPFAM) pyruvate dehydrogenase (lipoamide] alpha chain le-06 

[SUPFAM) pyruvate dehydrogenase (lipoamide] beta chain 7e-12 

[SUPFAM) ferredoxin 2 [4Fe-4S] -related protein 8e-47 

(SUPFAM) thiamine pyrophosphate-binding domain homology 0.0 

[SUPFAM] pyruvate dehydrogenase (lipoamide) 6e-08 

[SUPFAM] ferredoxin 2HFe-4S) homology 8e-47 

[SUPFAM] hypothetical protein C2814 2e-21 

[SUPFAM] transketolase 0.0 

[ PROS I T E ] AT P_GT P_A 1 

[PFAM] Transketolase 

[KW] Alpha Beta 

[KW] 3D 

(KW] LOW_COMPLEXITY 3.04 * 

SEQ MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYK 

SEG 

IngsB HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 



SEQ QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDILNLRKLHSDLERHPTPRLPFVD 

SEG 

IngsB TTTTTTTTTCEEEETTGGGHHHHHHHHHHHCTTCHHHHHTTTTTTTTTTTTTTTTTTTTC 

S EQ VATGS LGQGLGTACGMA YTGKYL DKASYRV FC LMGDGES S EGS VWEA FAFAS H YNLDNLV 

SEG 

IngsB CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTTEE 

SEQ AVFDVN RLGQSG PA P LEHGADI YQNCCEAFG WNT YL V DG H DVEALCQA FWQASQVKNK PT 

SEG 

IngsB EEEEECCEETTEEGGGCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

SEQ AIVAKT FKGRG I P N I EDAENWHG K PV PKERA DAI VKLI ESQIQTN ENLIPKSPVEDSPQI 

SEG 

IngsB EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHKHHHHHHHHKCHHH 

SEQ SITDIKMTSPPAYKVGDKIATQKTYGIALAKLGRANERVIVLSGDTMNSTFSEIFRKEHP 

SEG 

IngsB HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHHHHTTTTTEEEEETTTHHHHCCTTCEECCG 

SEQ ERFI EC 1 1 AEQKMV S VALGCATRGRT I AFAGA FAA F FTRA FDQLRMGAI SQAN INLIGSH 

SEG xxxxxxxxxxxxxxxxxxx 

IngsB GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGHHHHHHHHHHCTTTEEEEEC 

SEQ CGVSTGEDGVSQMALEDLAMFRSI PNCTVFYPSDAISTEHAIYLAANTKGMCFIRTSQPE 

SEG 

SEQ TAVI YTPQENFEIGQAKWRHGVNDKvTVIGAGVTLHEALEAADHLSQQGISVRvIDPFT 

SEG 

IngsB CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE 

SEQ IKPLDAATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRG 

SEG 

IngsB 

SEQ KTSELLDMFGISTRHII AAVTLTLMK 

SEG 

IngsB 
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Pfam for DKF2phtes3_17117 . l 



HMMNAME Transketolase 

HMM * vNtlRiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 
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+N++RI ++ A + +SG ++++++A++ VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEHPD 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ H + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 

HMM PGVEVTTGPLGQGIaNaVWMAIAERnLAATYNRPGFDIf DHYTYCFMGDG 
++ +V+TG+LGQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV-DVATGSLGQGLG TACGMAYTGKYLDKAS YRVFCLMGDG 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSIDGdTdlWFqEDtYakRF 
+ +EG++WEA ++A+H++L+N++A +D NR++++G++++ + D+Y+ + 
Query 158 ESSEGSVWEAFAFASHYNLDNLVAVFDVNRLGQSGPAPLEHGADIYQNCC 

HMM EAYGWHVIEVEnDGHDvEelcaAIEeAKaekDRPTLIiCRTVIGYGSPNJc 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T++G+G+PN 
Query 208 EA FGWNT YLV- - DGH D V EALCQA FWQASQVKN K PTA I VAKT FKG RG I PN I 

HMM QGTHdWHGA PLGe D* 

++ + WHG+P +++ 
Query 256 EDAENWHGKPVPKE 269 

HMM * PqWePnddkl ATRKASQqaLeaiGPaLPEf WGGSADLTPSNLTrWKGmv 

P++++ +DKIAT K+++ AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLAIAKLGRANERVIVLSGDTMNSTFSEIFRKE 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGIAIHGgNFRPYGGT 

+ + R+I+ + I+E++M++++ G+A++G+ ++++ G 

Query 359 H PERFIECIIAEQNMVSVALGCATRGR-TIAFAGA 

HMM FMM Fy DY AR PA I RMAALMe 1 PV I WVWT HDS I GLGEDG PTHQ PVEHLAH FR 

F++F+++A++++RM A++ +++++++H++++ GEOG +++++E+LA+FR 
Query 393 FAAFFTRAFDQLRMGAISQANINLIGSHCGVSTGEDGVSQMALEDLAMFR 

HMM alPNMsVWRPCDgNETayAWylAvERehTPtiLILSRQNLPQIErNPrqf 

+IPN +V++P+D+ T+ A YLA+++++ +++++S ++ +++++ P f 
Query 443 SIPNCTVFYPSDAISTEHAIYLAANTKGM-CFIRTSQPETAVIYT-PQEN 

HMM ekvaRGGYVLkDmdnePDVILIATGSEMELAvaAAKlLadEGIkaRWSM 
+++++++tV + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 

Query 491 feigqakvvrhgvn--dkvtvigagvtlhealeaadhlsqqgisvrvidp 

HMM PCTeWFD kQDeEYReSVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + ++++R +++DH++ +++++++V ++ +++ + 

Query 539 FT I KPL DAATI I SS AKATGGRV ITVEOHYR- EGG 1 G EAVCAAVS RE P D I L 

HMM GalfGMNrFGESSGKAPpevLYkMFGFTPENI* 

+ +++ +++ ++ +L+ MFG+ +1 

Query 588 VHQLAVSGVPQR GKTSELLDMFG1STRHI 616 
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68 
117 
157 
207 
255 

358 
392 
442 
490 
538 
587 
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DKFZphtes3_17nl2 



group: transcription factors 

DKFZphtes3_17nl2.1 encodes a novel 804 amino acid protein which is nearly identical to mouse 
and trout SOX-LZ . 

Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context. 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper . 

The new protein can find application in modulating /bloc king the expression of SOX-controlled 
genes . 



nearly identical to mouse SOX-LZ 

complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ, involved in spermatogenesis 



Sequenced by GBF 
Locus: unknown 



Insert length: 2802 bp 

Poly A stretch at pos. 2692, polyadenylation signal at pos . 2660 



1 GGGATAGGAA AGATGAAAGG TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
51 GTTGTCTCAT GTAACAATAG TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
101 CAGCCCCTAA GTCAGGTGAT GGAACAAATA CCTACAGTTT AGTCAGGTGA 
151 AACAGGAGTG GGTGGAGGAA GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
201 CAAGCAAGCC ACCTCTCCAT TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
251 TGACCCAGGA TTTAACCTCA AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
301 GTGGCCTCCC ATCTGCCTCT GCACCCCATA ATGCACAACA AACCTCACTC 
351 TGAGGAGCTA CCAACACTTG TCAGTACCAT TCAACAAGAT GCTGACTGGG 
401 ACAGCGTTCT GTCATCTCAG CAAAGAATGG AATCAGAGAA TAATAAGTTA 
451 TGTTCCCTAT ATTCCTTCCG AAATACCTCT ACCTCACCAC ATAAGCCTGA 
501 CGAAGGGAGT CGGGACCGTG AGATAATGAC CAGTGTTACT TTTGGAAGCC 
551 CAGAGCGCCG CAAAGGGAGT CTTGCCGATG TGGTGGACAC ACTGAAACAG 
601 AAGAAGCTTG AGGAAATGAC TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
651 GGAAAAACTA CTTTCAAAAG ATTGGAAGGA AAAAATGGAA AGACTAAATA 
701 CCAGTGAACT TCTTGGAGAA ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
751 AAAGAACGGC AGCTCTCCAC CATGATTACC CAGCTGATCA GTTTACGGGA 
801 GCAGCTACTG GCAGCGCATG ATGAACAGAA AAAACTGGCA GCGTCACAAA 
851 TTGAGAAACA ACGGCAGCAA ATGGACCTTG CTCGCCAACA GCAAGAACAG 
901 ATTGCGAGAC AACAGCAGCA ACTTCTGCAA CAGCAGCACA AAATTAATCT 
951 CCTGCAGCAA CAGATCCAGG TTCAGGGTCA CATGCCTCCG CTCATGATCC 
1001 CAATTTTTCC ACATGACCAG CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
1051 CAGGGATTCC TCTTCCCCCC TGGAATAACA TACAAACCAG GTGATAACTA 
1101 CCCCGTACAG TTCATTCCAT CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
1151 TCAGCCCTTT ACAGCTCCAG CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
1201 CAGGTGTCAC CTGGAGCAAA GATGCCATCA ACTCCACAGC CACCAAACAC 
1251 AGCAGGGACG GTCTCACCTA CTGGGATAAA AAATGAAAAG AGAGGGACCA 
1301 GCCCTGTAAC TCAAGTTAAG GATGAAGCAG CAGCACAGCC TCTGAATCTC 
1351 TCATCCCGAC CCAAGACAGC AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
1401 CCAGAACCTC TTCCCAGCCA GCAAAACCAG CCCTGTCAAT CTGCCAAACA 
1451 AAAGCAGCAT CCCTAGCCCC ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
1501 TTAGGTAAAT GGAAAAGTCA ACACCAGGAA GAGACTTACG AATTAGATAT 
1551 CCTATCTAGT CTCAACTCCC CTGCCCTTTT TGGGGATCAG GATACAGTGA 
1601 TGAAAGCCAT TCAGGAGGCG CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
1651 CAACAGCAGC AACAGCCACA TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
1701 TAATATGGGG CTGAACAGCT GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
1751 AGAATTTGGG GCCCCAGTTA ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
1801 GGCCCAGGTG TCATCGACCT TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
1851 AGCAATGAAT GGCTCTGCAG CTAAACTACA GCAGTATTAT TGTTGGCCAA 
1901 CAGGAGGTGC CACTGTGGCT GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
1951 CGTGCCAGCA GCGAGCCACA CATTAAGCGA CCAATGAATG CATTCATGGT 
2001 TTGGGCAAAG GATGAGAGGA GAAAAATCCT TCAGGCCTTC CCCGACATGC 
2051 ATAACTCCAA CATTAGCAAA ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
2101 AACCAGGAGA AGCAACCTTA TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
2151 CCACTTAGAG AAGTACCCAA ACTATAAATA CAAACCCCGA CCGAAACGCA 
2201 CCTGCATTGT TGATGGCAAA AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
2251 ATGAGGTCTC GGAGACAGGA GATGAGGCAG TTCTTTACTG TGGGGCAACA 
2301 GCCTCAGATT CCAATCACCA CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
2351 TCACTATGGC AACTACCACA CCATCGCCTC AGATGACATC TGACTGCTCT 
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2401 AGCACCTCGG CCAGCCCGGA GCCCAGCCTC CCGGTCATCC AGAGCACTTA 
24 51 TGGTATGAAG ACAGATGGCG GAAGCCTAGC TGGAAATGAA ATGATCAATG 
2501 GAGAGGATGA AATGGAAATG TATGATGACT ATGAAGATGA CCCCAAATCA 
2551 GACTATAGCA GTGAAAATGA AGCCCCGGAG GCTGTCAGTG CCAACTGAGG 
2601 AGTTTTTGTT TGCTGAATTA AAGTACTCTG ACATTTCACC CCCCTCCCCA 
2651 ACAAAGAGTT ATTAAAGAGC CCGCATGCAT TTGTGGCTCC ACAATTAAAA 
2701 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2801 AA 



BLAST Results 



NO BLAST result 



Medline entries 



95311974: 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine zipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 



Peptide information for frame 1 



ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 



1 MGRMSSKQAT SPFACAADGE 
51 HNKPHSEELP TLVSTIQQDA 
101 SPHKPDEGSR DREIMTSVTF 
151 EDSSCMEKLL SKDWKEKMER 
201 LISLREQLLA AHDEQKKLAA 
251 QHKINLLQQQ IQVQGHMPPL 
301 KPGDNYPVQF IPSTMAAAAA 
351 PQPPNTAGTV SPTGIKNEKR 
401 SPTSPTQNLF PASKTSPVNL 
451 TYELDILSSL NSPALFGDQD 
501 KLSSINNMGL NSCRNEKERT 
551 DAEGSKAMNG SAAKLQQYYC 
601 MNAFMVWAKD ERRKILQAFP 
651 ARLSK1HLEK YPNYKYKPRP 
701 FTVGQQPQIP ITTGTGVVYP 
751 VIQSTYGMKT DGGSLAGNEM 
801 VSAN 



DAMTQDLTSR EKEEGSDQHV ASHLPLHPIM 

DWDSVLSSQQ RMESENNKLC SLYSFRNTST 

GTPERRKGSL ADVVDTLKQK KLEEMTRTEQ 

LNTSELLGEI KGTPESLAEK ERQLSTMITQ 

SQIEKQRQQM DLARQQQEQI ARQQQQLLQQ 

M1PIFPHDQR TLAAAAAAQQ GFLFPPGITY 

SGLSPLULQQ LYAAQLASMQ VSPGAKMPST 

GTSPVTQVKD EAAAQPLNLS SRPKTAEPVK 

PNKSSIPSPI GGSLGRGSSL GKWKSQHQEE 

TVMKAIQEAR KHREQIQREQ QQQQPHGVEX3 

RFENLGPQLT GKSNEDGKLG PGVIDLTRPE 

WPTGGATVAE ARVYRDARGR ASSEPHIKRP 

DMHNSNISKI LGSRWKSMSN QEKQPYYEEQ 

KRTCIVDGKK LRIGEYKQLM RSRRQEMRQF 

GAITMATTTP SPQMTSDCSS TSASPEPSLP 

INGEOEMEWY DDYEDDPKSD YSSENEAPEA 



BLASTP hits 



Entry MMSOXLZ2_l from database TREMBL: 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 

Score - 3910, P = 0.0e+00, identities - 764/801, positives - 774/801 

Entry 151083 from database PIR: 
SOX-LZ - rainbow trout 

Score - 1774, P - l.le-287, identities « 365/532, positives - 431/532 

Entry S59121 from database PIR: 
SOX6 protein - mouse 

Score - 2319, P - 1.2e-240, identities - 489/660, positives - 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSoxSL"; product: "SOX5"; Mus musculus mSoxSL mRNA, complete 
cds . 

Score - 1212, P = 8.9e-209, identities = 274/457, positives - 324/457 
Entry MMUO10604_l from database TREMBL: 

gene: "sox5"; product: "L-Sox5 protein"; Mus musculus mRNA for 
transcription factor L-Sox5 

Score - 879, P = 4.2e-195, identities - 190/281, positives - 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2, frame 1 

Report for DKFZphtes3_17nl2 . 1 



[LENGTH] 
(WO 
[pi] 
IHOMOL] 

( FUNCAT] 

t FUNCAT} 

[ FUNCAT 1 

cerevisiae, 

[ FUNCAT 1 

7e-06 

(FUNCAT) 

[ FUNCAT 1 

t FUNCAT) 

[FUNCAT] 

[SCOP] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

{SUPFAM] 

(PROSITE) 

(PROSITE) 

( PROSITE 1 

[PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAMJ 

[KW] 

[KW] 

[KW] 

[KW] 



804 

89332.69 
6.97 

TREMBL :MMSOXLZ2 1 product: 



•SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds. 0.0 



04.05.01.04 transcriptional control [S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization [S. cerevisiae, YKL032c] 8e-07 
01.07.07 regulation of vitamins, cof actors, and prosthetic groups [S. 
YPR065w) 5e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR089c-a) 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a} 7e-06 

03.01 cell growth [S. cerevisiae, YBR089c-a) 7e-06 

03.16 dna synthesis and replication (S. cerevisiae, YMR072w] 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072w] 2e-04 

dlhmf 1.20.1.1.1 HHG1, fragments A and B [rat/hamster (Rattu le-13 

dllefa 1.20.1.1.6 Lymphoid enhancer-binding factor, LEF1 [mous 4e-15 

dlhrya~ 1.20.1.1.4 SRY [Human (Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

AT P_GTP_A 1 

LEUCINE ZIPPER 1 

MYRISTYL 6 

AMI DAT I ON 1 

CAMP_PHOSPHO_SITE 2 

CK2_PH0SPH0_SITE 14 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 



SEQ MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 

SEG 

COILS 

lnhm- 

SEQ TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 

SEG 

COILS 

lnhm- 

SEQ GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 

SEG 

COILS 

lnhm- 

SEQ KGTPESLAEKERQLSTMITQLISLREQLLAAHDEOKKLAASQIEKQRQQMDLARQQQEQI 

SEG xxxxxxxxxxxxxxx 

COILS CCCCCC 

lnhm- 

SEQ ARQQOjQLLQQQHKINLLQQQIQVQGHMPPLMI PI FPHDQRTLAAAAAAQQGFLFPPGITY 

SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCC 

lnhm- 

SEQ KPGDHYPVQFIPSTMAAAAASGLSPLQLQQLYAAQLASMQVSPGAKMPSTPQPPNTAGTV 

SEG xxxxxxxxxxxx 
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COILS 

lnhm- 

SEO SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

lnhm- 

SEO PNKSSIPSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . . xxxxxxxxxxxxxxxxxx 

COILS 

lnhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG . .xxxxxxxxxxxx 

COILS 

lnhm- 

SEQ PGVIDLTRPEDAEGSKAMNGSAAKLQOVYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

COILS 

lnhm- CCC 

SEQ MNAFMVWAKDERRKILQAFPDMHNSMISKILGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

COILS 

lnhm- CCCHHHHHHHHHHHHHHHTTTTCCHHHHHHHHHHHTTTTTTHHHHHHHHHHKHHHHKHHH 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQIPITTGTGWYP 

SEG xxxxxxxxxxxx 

COILS 

lnhm- HHHTTTTTTT 

SEQ GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQSTYGMKTDGGSLAGNEMINGEDEMEMY 

SEG xxxxxxx 

COILS 

lnhm- 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

lnhm- 
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PS00001 


97- 


■>101 


ASN GLYCOSYLATION 


PDOC00O01 


PS00001 


172- 


>176 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


388- 


>392 


ASN~GLYCOSYLATION 


PDOC00001 


PS00001 


422- 


>426 


ASN GLYCOSYLATION 


PDOC00O01 


PS00001 


559- 


>563 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


626- 


•>630 


ASN~G LYCOS Y LAT I ON 


PDOC00001 


PS00004 


126- 


•>130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


369- 


>373 


CAMP~PHOSPHO SITE 


PDOC00004 


PS00005 




5->8 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


28 


l->31 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


94 


->97 


PKC~PHOS PHO_SITE 


PDOC00O05 


PS00005 


136- 


>139 


PKC PHOSPHO~SITE 


POOC00005 


PS00005 


203- 


>206 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


299- 


>302 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


390- 


>393 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


512- 


>515 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


530- 


>533 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


692- 


>695 


PKC PHOSPHORITE 


PDOC0000S 


PS00006 


28 


l->32 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


129- 


>133 


CK2 PHOSPHO~SITE 


PDOC00O06 


PS00006 


146- 


>150 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0OO6 


148->152 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


154- 


>158 


CK2 PHOSPHO~SITE 


PDOC00006 


PS000O6 


186- 


>190 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


203- 


>207 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


221- 


>225 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


520- 


>524 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


533- 


■>537 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


547- 


>551 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


577- 


>581 


CK2 PHOSPHO~SITE 


P0OC00006 


PS00006 


639- 


•>643 


CK2 PHOSPHO~SITE 


PDOC00006 


psooooe 


793- 


>797 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


182- 


>188 


MYRISTYL 


pDocooooe 


PSO0OO8 


431- 


>437 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00029 



437->443 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AM I DAT I ON 

ATP_GTP_A 

LEUCINE ZIPPER 



POOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC000O9 
PDOC00017 
PDOC00029 



Pram for DKFZphtcs3_17nl2 . 1 



HMM_NAME HMG {high mobility group) box 

HMM * PKRPMNAYMLWMQEMRe k I KaENPNdMhNt EI SKMi GEMWKnMsEEEKm 

+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK+ 
Query 597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSNISKILGSRWKSMSNQEKQ 644 

HMM PYEdMAeeEKqRYMKEMPeYK* 
PY+++ +++ + +++ +P+YK 
Query 645 PYYEEQARLSKIHLEKYPNYK 665 
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DKFZphtes3_17nl8 



group: intracellular transport and trafficking 

DKFZphtes3_17nl8 encodes a novel 782 amino acid protein with weak partial similarity to known 
proteins . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 
receptor protein signature 1. In E. coii, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell. 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 



unknown receptor 

protein containes TONB_DEPENDENT_REC_l Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length: 2853 bp 

Poly A stretch at pos. 2806, no polyadenylation signal found 



1 GTCCTTTTAA GTCAGTAAAT TGAACTAAGT CGGTTATTCG GCAAGCAGTT 
51 CCTATAAAAA ACTACATGGC TAAGGTTCTT AATGATTGAC CACAAGCAGA 
101 TCTTTCACCC TCGGATCTCT AGCTACAAAA GGTCCCCACA CTGAAGAAGC 
151 CACTACCTCC ACCACCACCA GCACCACCAC GTCCAGTGCT GCTGGCAACC 
201 ACTGGGGCAG CCAAGCGCTC CACCCTCTCT CCCACCATGG CCCGTCAGGT 
251 GCGCACCCAC CAGGAGACCC TGAACAGGTT TCAGCAGCAG TCCATCCACC 
301 TGCTGACGGA GCTCCTCAGA CTGAAGATGA AGGCCATGGT GGAGTCTATG 
351 TCGGTGGGTG CCAACCCCTT GGACATCACC AGGCGCTTTG TGGAGGCCAG 
401 CCAGCTCCTC CACCTCAATG CCAAGGAGAT GGCCTTCAAC TGCCTGATCA 
451 GCACAGCCGG GAGAAGTGGC TACAGCAGCG GACAGTTGTG GAAAGAGTCC 
501 CTCGCAAACA TGTCCGCCAT TGGGGTGAAC TCGCCTTACC AGCTGATCTA 
551 CCACTCTTCC ACAGCCTGTC TGAGCTTTTC TCTCTCTGCT GGAAAAGAAG 
601 CCAAGAAGAA AATAGGCAAA TCTAGAACTA CAGAAGATGT CAGCATGCCG 
651 CCCCTGCATC GAGGAGTGGG AACCCCTGCC AACAGCCTGG AGTTCAGCGA 
' 701 CCCCTGCCCT GAGGCCCGGG AGAAGCTGCA GGAGTTGTGT CGCCACATAG 
751 AAGCTGAAAG GGCCACATGG AAAGGGAGGA ATATCTCCTA CCCCATGATC 
801 TTACGAAACT ACAAGGCAAA GATGCCCTCT CATCTAATGT TGGCCCGCAA 
851 AGGAGACTCT CAGACCCCGG GTTTACATTA CCCTCCCACT GCAGGTGCTC 
901 AGACTCTCAG CCCCACCTCT CACCCATCTT CTGCCAACCA TCATTTCAGT 
951 CAGCATTGTC AAGAGGGGAA GGCACCCAAG AAGGCCTTCA AGTTTCATTA 
1001 CACCTTCTAT GATGGCTCCT CCTTCGTTTA CTATCCCTCT GGAAACGTCG 
1051 CTGTATGTCA GATCCCCACA TGCTGCAGAG GGAGAACCAT CACCTGCCTC 
1101 TTTAATGACA TACCTGGATT CTCCTTGCTG GCCCTATTCA ATACTGAAGG 
1151 CCAGGGCTGT GTTCACTACA ACCTAAAAAC CAGTTGCCCA TATGTCTTAA 
1201 TCTTGGATGA GGAAGGTGGG ACCACCAATG ACCAGCAGGG CTATGTAGTC 
1251 CACAAGTGGA GCTGGACTTC CAGGACAGAG ACCCTGCTTT CCCTGGAATA 
1301 CAAGGTGAAT GAGGAAATGA AACTAAAGGT ACTGGGACAG GACTCCATCA 
1351 CAGTCACCTT CACCTCCCTG AATGAGACAG TAACACTCAC TGTGTCGGCC 
1401 AACAATTGTC CCCATGGAAT GGCATATGAC AAACGGCTGA ACCGCAGAAT 
1451 CAGCAACATG GACGACAAGG TGTATAAGAT GAGCCGAGCC CTGGCTGAGA 
1501 TCAAGAAGCG GTTTCAGAAG ACAGTGACTC AGTTCATTAA TTCTATCTTG 
1551 CTGGCCGCAG GTCTGTTTAC CATTGAATAT CCCACCAAAA AGGAGGAGGA 
1601 AGAATTTGTT CGGTTCAAGA TGAGATCCAG AACTCATCCC GAGCGGCTCC 
1651 CCAAGCTAAG TTTATACTCA GGAGAAAGTC TTTTACGATC TCAGTCAGGC 
1701 CACCTGGAAT CCTCAATTGC AGAGACTTTG AAGGATGAGC CTGAGTCTGC 
1751 TCCTGTGAGC CCAGTTCGGA AGACCACCAA AATCCACACC AAAGCCAAGG 
1801 TCACATCCAG AGGGAAGGCC CGCGAGGGGC GCAGCCCCAC CAGGTGGGCG 
1851 GCCTTGCCCT CAGACTGCCC GCTGGTGCTG CGGAAGCTCA TGCTCAAGGA 
1901 AGACACCCGT GCTGGCTGCA AGTGCCTGGT GAAGGCGCCC CTGGTCTCTG 
1951 ACGTGGAGCT GGAGCGCTTC CTGTTGGCGC CCCGAGACCC CAGCCAAGTG 
2001 CTGGTGTTTG GGATCATCTC AAGCCAGAAC TACACCAGCA CTGGGCAGCT 
2051 CCAGTGGCTG CTGAACACTC TCTACAACCA CCAGCAGCGG GGCCGTGGCT 
2101 CCCCCTGCAT CCAGTGCCGG TATGACTCCT ACCGCCTGCT GCAGTATGAC 
2151 CTGGACAGCC CCCTGCAGGA GGACCCTCCC CTGATGGTGA AGAAGAACTC 
2201 TGTGGTGCAG GGGATGATTC TGATGTTTGC CGGGGGGAAG CTCATTTTTG 
2251 GGGGCCGTGT TTTGAATGGA TATGGCCTCA GCAAGCAGAA TCTGCTGAAA 
2301 CAGATCTTCC GGTCTCAACA GGATTACAAG ATGGGCTACT TCCTGCCGGA 
2351 TGACTACAAA TTCAGTGTTC CCAACTCTGT CCTGAGCCTG GAGGATTCTG 
2401 AATCAGTCAA GAAAGCCGAG TCAGAAGATA TCCAAGGAAG CAGCTCCTCA 
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2451 TTGGCCCTGG AAGACTATGT GGAtfAAGGAG TTATCTCTGG AGGCTGAGAA 

2501 GACAAGAGAG CCTGAAGTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 

2551 TAACTAGTTG GAAGAAGCAG GCCT<?CAAGA AGTAGCGCCA TCCTGGCAGC 

2601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 

2651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGr CCTCCAAGCT TCTATAATAA 

2701 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 

2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 

2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 

2851 CCG 



ORF from 237 bp to 2582 bp; peptide length: 782 

Category: putative protein 

Proaite motifs: ATP_GTP_A (122-130) 

TONB DEPENDENT_REC_1 (1-44) 



1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 

51 FVEASQLLHL NAKEMAFNCL ISTAGRSGYS SGQLWKESLA NMSAIGVNSP 

101 YQLIYHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 

151 LEFSDPCPEA REKLQELCRH IEAERATWKG RNISYPMILR NYKAKMPSHL 

201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 

251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 

301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT HDQQGYVVHK WSWTSRTETL 

351 LSLEYKVNEE MKLKVLGQDS ITVTFTSLNE TVTLTVSANN CPHGMAYDKR 

401 LNRRISNMDD KVYKMSRALA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 

451 KKEEEEFVRF KMRSRTHPER LPKLSLYSGE SLLRSQSGHL ESSI AETLKD 

501 EPESAPVSPV RKTTKIHTKA KVTSRGKARE GRS PTRWAAL PSDCPLVLRK 

551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGIISSQNYT 

601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 

651 VKKNSWQGM ILMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 

701 YFLPDDYKFS VPNSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 

751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 



BLAST Results 



No blast result 



Medline entries 



No Medline entry 



Peptide information for frame 



3 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_17nl8, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_17nl8, frame 3 



Report for DKFZphtes3_17nl8 . 3 



(LENGTH) 
[MW] 



782 

88030.16 
9.22 

BL00286 Squash family of serine protease inhibitors proteins 

ATP_GTP A 1 

MYFISTYL 4 

CAMP_PHOSPHO_SITE 3 

CK2 PHOSPHO SITE 14 

PROKAR_LI POPROT E I N 1 

TONB DEPENDENT_REC 1 1 

PKC_PHOSPHO_SITE ~ 10 

ASN_GLYCOSYLATION 4 

Alpha_Beta 



tpl] 



[BLOCKS) 
(PROSITE) 
[PROSITE] 
[PROSITE] 
[PROSITE) 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
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SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLIYHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPANSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RNISYPMIliRNYKAKMPSHLMLARKGDSQTPGLHYPPTAGAQTLSPTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQIPTCCRGRTITCLFNDIPGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FNTEGQGCVHYNLKTSC PYVL I LDEEGGTTNDQQGY WHKWSWTS RTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANNCPHGMAYDKRLNRRISNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeccccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRSPTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKCLVKAPLVSDVELERFLLAPRDPSQVLVFGIISSQNYT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYNHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKNSWQGM 

PRD ccchhhhhhhhhhhhhcccccccceeeecccccceeecccccccccccccccccchhhhh 

SEQ ILMFAGGKLIFGGRVLNGYGLSKQNLLKQI FRSQQDYKMGYFLPDDYKFSVPNSVLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD cc 



Prosite for DKFZphtes3_17nl8 . 3 



PS00001 


91 


->95 


ASN GLYCOSYLATION 


PDOC00001 


PSOO001 


182- 


>186 


ASN GLYCOSYLATION 


PDOCO0O01 


PS00001 


379- 


>383 


ASN~GLYCOSYLATION 


PDOC00001 


PS00001 


598- 


>602 


asn'glycosylation 


PDOC00001 


PS00004 


403- 


>407 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00QQ4 


511- 


>515 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


652- 


>656 


CAMP PHOSPHO~SITE 


PDOCO0O04 


PS00005 


48 


\->Sl 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


177- 


>180 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


344- 


>347 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


450- 


>453 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


497- 


>500 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


513- 


>516 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


523- 


>526 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


631- 


>634 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


723- 


>726 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


774- 


>777 


PKC PHOSPHO~SITE 


PDOC00005 


PS00006 


1 


->11 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


131- 


>135 


CK2~PHOSPHO SITE 


PDOC00006 


PSO0006 


256- 


>260 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


329- 


>333 


CK2 PHOSPHORS ITE 


PDOC00006 


PS00006 


345- 


>349 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2~PHOSPKO~SITE 


PDOC00006 


PS00006 


406- 


>410 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


466- 


>470 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


571- 


>575 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


693- 


>697 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


717- 


>721 


CK2~PHOSPKO~SITE 


PDOC00006 


PS00008 


145- 


>151 


MYRISTYL 


PDOC00008 


psooooe 


327- 


>333 


MYRISTYL 


PDOC00008 


psooooe 


592- 


>598 


MYRISTYL 


PDOC00008 


psooooe 


734- 


>740 


MYRISTYL 


PDOC00008 
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PS00013 101->112 PROKAR_LIPOPROTEIN PDOC00013 

PS00017 122->130 ATP GTP_A PDOC00017 
PS00430 l->44 TONB_DEPENDENT_REC_l POOC00354 



(No Pfam data available for DKFZphtes3_17nl8 -3) 
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DKFZphtes3_18f3 



group: testes derived 

DKFZphtes3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1. 

The novel protein contains two leucine zippers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to TNF-inducible protein CG12-1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4 608 bp 

Poly A stretch at pos. 4570, polyadenylation signal at pos . 4550 

1 GACAGAAGTG AATGGGAATG GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 

51 CCCGACGCGC TGCGGCGCTT CCAGGGACTG CTGCTGGACC GCCGAGGCCG 

101 GCTGCACCGC CAGGTGCTGC GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 

151 GCCTGCGCAG GCGCTCCCTC GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 

201 GCAACGGGCG CCCTCGCCGC CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 

251 CCTGGGGACC TCGCTGCTGG TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 

301 CCGGAGGGGC CGTCACCATC ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 

351 TCCCGGGAGC TGCGGAGGGT GCAGGAGATC GCGGCCACCT GCCAGGACCA 

401 GATGCGAGAG ATCCTGAGCT GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 

451 GCGGGGACCG CCAGCTGCTG CAGTGCGGGA GGAACGCCTC CATCGCCCTG 

501 TACAATTCTG TCTACTTCAT CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 

551 CCCCAGGCGG GCGGAGGGGG ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 

601 CCAAGATTCA GAAACTGGCC GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 

651 GACGAACTCA GCGAGCAGCT GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 

701 CAGTCGTGGC CACGACCTCA AGATCTCTGC TGACCAGCGT GCAGGGCTGT 

751 TTTTCTGAGA ACATCCTTTC CCCCTAATGA CCGAGGCCAG CAAATCATCC 

801 TCATGGGATG CTCCAGAATT TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 

851 GTTAGGAGCC GAAGGCAAAG GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 

901 GTCCCCAAAG CCCTTCTTTT CCCATCACTG TGACATCTGC CTGGGCTTGA 

951 GTGCTACGGA CTTTTCAGTC TTCCTAGTGG AAAAATGTGA CCCAAAAACT 

1001 CCTTTTCCTT TATCAAAAAC TTTCTGTCTA AACACAGCTG GGCAGGCACT 

1051 CCTGTTTTAA AGTTATTTCG GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 

1101 CTGAGACTGG AGAGAGTGCC ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 

1151 TCTTTGAAGT CCTCAAAATG TGCGTGAGGA AGGCATTTGC CTCTATTCCA 

1201 GAATTTCTGA TACAAAGAAC TCCAGAATCC AGAGCAAATC AGCCCTTCTC 

1251 TGAACGTTGT AGGATGGTTC AGAACCCAGA GAGGACCCTG GTGCTGATAT 

1301 CTCCTCCTCT TCCCTTTCCC CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 

1351 GGTATGAAGT AGGCCTTTCC TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 

1401 CTCAGAGGGG AAGCCCGTGA AGCCGTCATC TAAGTCCTGC TCCCTCACAT 

1451 GAAGCTGAGG GCCAGATAGA TGGAGCGACT GCCAACTTCA TTTCCCGACA 

1501 TCATTGTGTT CAGAAGAGAG TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 

1551 TTGAGACAGG CTTTGTCACT ACTGTGTGAG TGTAGCCACC TAATCTCTCT 

1601 GAGACTGTGT AAAACAAAGA TGATAAAATC TCACCCTGTT GTGAGATATT 

1651 AAATGAGCCA AAGTGCCTAG CATGATGGTG CTGGCTCATA TAGTGTAGTC 

1701 CCTGGAATGG CAAATTAACA TCACCCAGGA ACTTGTTAGA AAGGCAAATT 

1751 CTTGGACACA ACCCTCCTGA TTTATGGAAT CAGAAACTCT GGCTGTGGGG 

1801 CCCAGCAACC TGAGTTTAAA CAATTTCTCT GGGTGGTTCT GCGGCACACT 

1851 AAGGTTTGAA AATCACTACA ACAAATGCTA ACTTCTAATC CCCTTGATGA 

1901 GCTTTCACGA AGTCTCACGG CTTCTCTAGG GACTCCATGG TCTTCAGAGT 

1951 CGTTCACAGA TGACCAAGGA CAGACTGTGT CCCAGAAGCC AAAATGAGAG 

2001 AGAGAGAGAG AGCACGCGTA CGTGCACCCT GGGGCAGTGT CTCACCGTAT 

2051 GAATAAGGGA TGTAACACTA AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 

2101 TGTTGTAGAA ACTGGTACAG AAAGGATCCT ATATGAAGTT CCTGAAACTG 

2151 ACCTTTGTCT ATTATTACCT TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 

2201 TATTTATTTT AAGTTTGTAA TTTAATTTTT AATTATTGTT TAGTGTTTGC 

2251 ATTTAATTTT ATTTAATCAC CACATTTAGA AAATAATAAG AGCAAGTTTC 

2301 TAAATGGGAG ACTGCTGAGG CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 

2351 TTTCTAAGGC AGGGCATGAG CTGGAAATAG CATTGCTTTC CTTGATTGTC 

2401 TCTCTCCTTC AGGGAGATTC TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 

2451 AAGTAAGTGT GGAGAGTCTT GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 

2501 TTGTAATGGA AGCTTGCATT GTGGGATATA TAACTGAGGA AGCATATTTA 

2551 TCCTGAAGGT ATTTTGCCAG AAGGTATCAC TTGACCTGGA AAAGGAATCT 

2601 ATTTAGTTCA GGAAAGATAA AAAGTTTAGA GGTATGTGAA GGAAGCACTT 

2651 AGAACTTGCA AGCCTGATGT CCTATCAAGT TATGTCTTCT GGGTGACAGA 

2701 CAAAATAGCT TGTCTTATGG TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2751 CTGTAAGAAA CTGTCAGTGA AAATATGTAC AATTCCTTCA ATTTCCATTC 

2801 TTAACAACTG TAATGTTGAA AAATAAGTTG AAAAGTCTTT GGGACCATAC 

2851 ATGCAAAAAC GGTGCCTCTG TTACTTAATT ATTTAATATT CTATAAATGT 

2901 ACCCAATCTG TCCGCACCCT TCCCAGTGAT GGGGCAGTAT GTCTGAGGAA 

2951 GTATAATTTC AGTACTGGGG TCGGGGAGAG GAGGTGATGT TTCTACATTT 

3001 TTATTTTTTC TATAAATTGC AATTGGTCTG TATGCTGGTT TATTTTGAAA 

3051 TTTATATTGG TTTCTTTTCA AGCTGGTGTC ATCTCCTAGA CTGTTTCACC 

3101 CAGATGCTAG CATTTTTTTT TTTTTTGAGA CAGAGTCTCA CTCTGTCACC 

3151 TAGGCTGGAG TTGCAGTGGT TTGATCTCGG CTCACTGCAA CCTCCGACTC 

3201 CTGGGTTCAA GCAATTCTTC TGCCTCAGCC TCCTGAGTAG CTGGGATTAC 

3251 AGATGTGCAC CAGCACACCC GGCTAATTTT TTGTATTTTT AGTAGAGACA 

3301 GGGTTTCGCC ATGTTGGCCA GGCTGGTCTT GAACTCCTGG CCTTATGTGA 

3351 TCCGCCCACC TTGGCTTCCC AAAGTGCTGG GATTACAGGC ATGAGCCACC 

3401 TCGCCTGGCC AGATGCTAGC ATTTTAGATC AAACAATTCA TTTTAGATGA 

3451 ATTGTTTTGT TTCACAATCA TTTTAAATCA TTTTAGAATG TACTTCACAT 

3501 TATTAGTTGT GTTATGGCAT AAAGGTACAA CCATTCCCTA ACTCCATCTT 

3551 TTATTAATGC TTAAGTTTAA ATTATATTCT TCCAATGCCT AAGCTATTCC 

3601 CTAGAATTAA ACTGGGCACT TTTGGAAGCA GCAACAGTAA CAGCAGCAGC 

3651 AAACTTTTCC TCTCATATTT TGGGTGTATC AAAAGTTCTA GACTTTTGAA 

3701 GTTATGATTT CAGTGGCCCA CTTTATTTCT AAGGAAGAGT GTCTACTTTG 

3751 GAACGATACT TTGCACATAG TAGGAACTCA AGAAATACAT TTGAATAATT 

3801 ATAATTAACT GTTTAGCTAT CTTAATGAGA ATTTGTTGAC AACAAAAGAT 

3851 CATCCATCGC CTTATGTGTG AGTAAGATTG GAGCCTCTAT CAAGATTTAG 

3901 TCAAGTTCAG TTAGATTGAT TCTAGAAACA AATATTTATT TCTTTCTTTT 

3951 ACGGGGATGT GAATAAGGCT TTTCCTTAAG GCCTTCATTC TTTAAACAAA 

4001 CAGGTTGAAA TGGTATGTTG TAAAAGAGAA GACGGGAGAG AGGTATTTAG 

4051 ATGATAAGTG TACTTCACAA AAATGCCAAA GTTTGAAAAA TAGGTATGTT 

4101 TGTTCTAAAT GTTTAAGTGC TTCTCTGTTA GGTTCTGGGG CTTGCAATCA 

4151 TTTGAATTGT TCTGTTTCAC AATAAAGGAG ATTCACTGGG TTCTGCATTT 

4201 TCAGGATTCA ATAGAACTGC TCCATTAAAA AAATAATCCT TAGCAAGCAT 

4251 TCGAATCCTA ACTGCTTTGA TGCACTTGCC CTCGGGCACC TGTCATTTCC 

4 301 AATATGGTAG GTGTCAAAGT CAAAAGTATT TACTGGGAGA AAAAAGAGAG 

4 351 GAGTGGTTGT AGAAGTCTCC CTAAATCAGA CATGTCAAGC AATCAGCCAA 

4401 CGTGGTGTAT TTCTCATTCA ATATTTTAGT GTGAATTGAG ACACTGAGAT 

4451 AAAGACATCG TGCAGAGATA AATGGGGATA CAGTTAAATG TAGCAACTCT 

4501 TGAGTTCATT TTTTCCCACT GTAGCAAAAT TAATGCTTTC TCTTTATTGA 

4551 AATAAATTGC TCATTCCTCC AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

4601 AAAAAAGG 



BLAST Results 



Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score - 1951, P - 9.0e-101, identities - 411/425 

Entry HS073350 from database EMBL: 
human STS EST303564. 
Score - 1417, p = 8.7e-5B, identities - 285/287 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 2 

PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments), N - 1, Score 
- 155, P - 4.5e-10 

TREMBL : HSCG1PA1_1 gene: "COL1A1"; Human proalpha 1 (I) chain of type I 
procollagen mRNA (partial)., N - 1, Score - 155, P - 6.5e-10 
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>PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments) 
Length - 779 

KSPs: 

Score - 155 (23.3 bits), Expect - 4.5e-10, P - 4.5e-10 
Identities - 60/152 (39%), Positives - 67/152 (44%) 

Query; 7 GEAGG PGAAWARRAAAL PGT AA — GPP P. PAA P PGA - - APARGG P APG A PAQA L P RSQRGR 62 

G+ G PG + AR PG GPP PA P GA AP G A A P SQ 

Sbjct: 230 GDLGA PG P SGARGERG FPGERG V EG P PG PAG PRGANGAPGNDGAKG DAGAPGAPGSQG A P 289 

Query: 63 QLAE RNG RPRRH RGALAQPGH PG DLAAG VG RGAGGGHS RRGRHHH VRS LADLLQL PGAAE 122 

L G P RGA PG GD +GA G + G VR L + PG A 
Sbjct: 290 GL QGMPGE-RGAAGLPGPKGDRGDAGPKGADGAPGKDG VRGLTGPIGPPGPAG 341 

Query: 123 GAGDRGHL-P-GP DARDPELPRVFLPLAGLRGPPAA 156 

GD+G P GP D +P P P AG GPP A 
Sbjct: 342 APGDKGEAGPSGPAGTRGAPGDRGEPGPPG P-AGFAGPPGA 381 

Score - 121 (18.2 bits), Expect - 5.4e-05, P - 5.4e-05 
Identities - 52/154 (33%), Positives - 60/154 (38%) 

Query: 7 GEAGG PGAAWARRAAAL PGTAAG P PRPAA P PGAA PARG GPAPGAPAQALPRSQRG 61 

G G PGAA R P AGPP PPG ++G GPA G P + P G 

Sbjct: 434 GATG F PGAA- GRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPA-GRPGEVG PPG PPG 491 

Query: 62 RQLAERNGRPRRH RGA LAQPGHPGDLAAGVGRGAGGGHS RRGRHHH VRS LADLLQL PGAA 121 

AGP G PG PG RG G +RG R L PG + 
Sbjct: 492 P--AGEKGAPGAD-GPAGAPGTPGPQGIAGQRGVVGLPGQRGE RGFPGL PGPS 541 

Query: 122 EGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVRE 160 

G +G R P P + GL GPP + RE 

Sbjct: 542 GEPGKQGPSGASGERGPPGP— HGPPGLAGPPGESGRE 577 

Score - 117 (17.6 bits), Expect - 1.8e-04, P - 1.8e-04 
Identities - 52/148 (35%), Positives - 62/148 (41%) 

Query: 7 GEAGG PGAAWARRAAAL PGTAAG P PRPAA P PGAA PARGG PA PGAP AQALP RSQRG - R 62 

G G PG AR +A PG A G P A PPG + GP PG P A +G R 

Sbjct: 416 GNVGAPGPKGARGSAGPPG-ATGFPGAAGRVGPPGPS-GNAGP-PGPPGPAGKEGSKGPR 472 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRH--HHVRSLADLLQLPGA 120 

GRP G + PG PG GA G G + ++ LPG 

Sbjct: 4T3 GETGPAGRP GEVGPPGPPGPAGEKGAPGADGPAGAPGT PGPQGIAGQRGVVGLPGQ 528 

Query: 121 AEGAGDRGH--LPGPDARDPEL-PRVFLPLAGLRGPP 154 

G+RG LPGP + P +G RGPP 

Sbjct: 529 R GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 

Score » 117 (17.6 bits). Expect - 1.8e-04, P - 1.8e-04 
Identities » 54/162 (33%), Positives - 64/162 (39%) 

Query: 7 G EAGG PGAAWARRAAALPGT - - AAG P PRPAAP PGAAPARG - -G PA - - PGAP AQAL PRS QR 60 

G G PG + PG A+GP P PPG GGAPGP+P + 

Sbjct: 29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

Query: 61 G -RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHS RRGRHHH V— RSLADLL 115 

G R L G P + HRG G GD +G G G + R L 

Sbjct: 89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRGLPGFP 148 

Query: 116 QLPGAA--EG-AGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157. 

GAA G AG+RG +PGP P AG +GPP A 

Sbjct: 149 GPKGAAGEPGKAGERG - V PG PPGAVG- - PAGK DGEAGAQG P PG P A 190 

Score = 113 (17.0 bits), Expect » 5.4e-04, P - 5.4e-04 
Identities - 54/148 (36%), Positives - 58/148 (39%) 

Query: 7 GEAGG PGAAWARRAAAL PGT A AG P PRPAA P—PGAAPARGGPAP- GAP AQAL PR 57 

G AG PGA A PG A AGPP PA P PG G P P GA A P 

Sbjct: 374 G FAG P PGADGQPGAKG E PG DAGAKG DAG P PG P AG PAG PPG P I GNVGA PG P KG ARGS AG P P 433 

Query: 58 SQRGRQLAERNG RP RRH RGALAQPGH PGDLAAGVGRGAGGGH S RRGRH HH V RS LA DLLQL 117 

G A P G PG PG +G G GR V 
Sb j c t : 434 GATG FPGAAG RVGP PG PSGNAGPPG P PG PAGKEGS KG PRGETG PAGRPG E VG P 486 

Query: 118 PGAAEGAGDRGHLPGPD--ARDPELPRVFLPLAGLRG 152 

PG AG++G PG D A P P +AG RG 

Sbjct: 487 PGPPGPAGEKG-APGADGPAGAPGTPGP-QGI AGQRG 52 1 

Score - 110 (16.5 bits). Expect - l,3e-03, P - 1.2e-03 
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Identities - 54/151 (35%), Positives - 60/151 (39%) 



Query: 


7 


G EAGG PGAAWARRAAAL PGTAAG P P RPAA P PG - - AA PA R- GG PAP-G A PAQA LP RSQRGR 


62 




GE G G A + LPG A GPP A PG P G P P GA + +RG 




Sbjct: 


194 


G E RG EQG PAGS PG FQGLPG PA-G P PGEAGKPG EQGVPGDLGA PG PS GARGERG FPG ERGV 


252 


Query: 


63 


QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 


122 






+ PR GA G GD A G+ G +G R A L PG 




Sbjct: 


253 


EG P PG P AG P RGANG A PGN DGAKG DAGA PGAPGS QGAPGLQGM PGE-RGAAGL- - - PGPK- 


307 


Query: 


123 


GAGDRGHL PGPDARD- - PELPRV FLP LAG LRG P P AAA 157 






GDRG GP D P V L G GPP A 




Sbjct: 


308 


--GDRGDA-GP KGADGAPGKDG V- RGLTG PIGPPGPA 340 




Score 


- 109 


(16.4 bits), Expect = 1.7e-03, P - 1.7e-03 




Identities * 


= 55/154 (35%), Positives - 60/154 (38%) 




Query: 


4 


NGN -G EAGG PGAAWARRAAAL PGTAAG P PR P AA P PGAAPARG-GPAPGA PAQALP RSQRG 


61 




NG+ GEAG PG R P AGP A PG RG GA A P +G 




Sbjct: 


67 


NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGL DGAKG DAGPAGPKG 


125 


Query: 


62 


RQLAE - RNG RP RRHRGALAQPGH PGDLAAGVG RGAGGGHSRRGRHH H VRS L ADLL 


115 




+ NGP+G PG PG A GG G V A 




Sbjct: 


126 


E PGS PG ENG A PGQ- MG P RGLPGF PG P KGAAGE PGKAGERG V PG P PGAVG PAGKDGEAGAQ 


184 


Query: 


116 


QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 






PG A AG+RG GP A P F L G GPP A 




Sbjct: 


185 


G P PG PAG PAGE RGE-QGP- AGS PG FQGLPG PAG PPG EA 220 




Score 


- 104 


(15.6 bits). Expect «■ 6.6e-03, P - 6.6e-03 




Identities - 44/131 (33%), Positives = 49/131 (37%) 




Query: 


2 


EVNGNGEAGG PGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 


60 




E GE G PG R LPG GP A PG A RG P P GA A + 




Sbjct: 


126 


E PGS PG ENGA PGQMGPR- — G L PG FP -G PKGAAGE PGKAGERGV PGP PGAVG PAGKDGEA 


181 


Query: 


61 


GRQ LAERNGRPRRHRGALAQPGH PGDLAAGVG RGAGGGHS RRGRHHH VRS LADLLQL PGA 


120 




G Q P RG G PG G+ G G G+ DL PG 




Sbjct: 


182 


GAQG P PG PAG PAGERGEQG PAGS PG- - FQGLP - G PAG PPGEAG K PGEQG VPGDL -GAPG P 




Query: 


121 


AEGAGDRGHLPG 132 






+ G+RG PG 




Sbjct: 


238 


SGARGERG-FPG 248 




Score 


- 104 


(15.6 bits), Expect - 6.6e-03, P - 6.6e-03 




Identities ■ 


■ 43/131 (32%), Positives - 55/131 (41%) 




Query: 


7 


G EAGG PGAAWARRAAAL PGTAAG PPRPAA P PGAAP ARGG PA PGA PAQAL P RSQRG RQLAE 


66 




GEAG GARA PG GPP PGA GP PGA Q ++G A+ 




Sbjct: 


347 


GEAGPSGPAGTRGA— PGDR-GEPGPPGPAGFA GP-PGADGQPGAKGEPGDAGAK 


397 


Query: 


67 


RNGRPRRHRGALAQPGH PGDLAAGVG RGAGGGHS RRGRHHH VRS LADLLQL PGAAEGAGD 


126 




+ P G PG G++ A +GA G G + A + PG + AG 




Sbjct: 


398 


G DAG PPG PAG PAGP PG P I GN VGA PGPKGARGS AG P PGATGF PGA -AGRVGP PGPSGNAG P 


456 


Query: 


127 


RGHLPGPDARD 137 






G PGP ++ 




Sbjct: 


457 


PGP-PGPAGKE 466 




Score 


- 104 


(15.6 bits). Expect = 6.6e-03, P - 6.6e-03 




Identities ■ 


- 56/162 (34%), Positives - 62/162 (38%) 




Query: 


7 


G EAGG PGAAWARRAAAL PGTAA— GPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQL 


64 




G G PGA A G GP P PGA ARG P P Q PR +G 




Sbjct: 


608 


G P PGAPGA PGP VGPAGKS G DRGETG PAG P I GP VG P AGARG PAG P-QG-PRGBKGZTG 


662 


Query: 


65 


AERNGRPRRH RG ALAQPGH PG DLAAGVGRGAGGGH S RRGRHHHVRS LA- DL LQ- LPG 


119 




+ + + HRG PG PG GA G RG SDL LPG 




Sbjct: 


663 


2 2 GB RG I KGH RG FSGLQG P PG P PG S PGEQG PSGASG PAG PRG PPG SAGS PGKDG LNGLPG 


722 


Query: 


120 


AAEGAGDRGHL--PGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQ 168 






G RG GP A P P P G GPP+ L +P Q 




Sbjct: 


723 


P I G P PGPRGRTG DAG P-AG P PG P PG P-PGPPGPPSGGYDLSFLPQPPQ 768 





Score - 101 (15.2 bits), Expect - 1.5e-02, P = 1.5e-02 
Identities - 49/148 (33%), Positives - 55/148 (37%) 



Query: 7 G EAGG PGAAWARRAAAL PGTAAG P P RPAA P PGAA PARGGP APGA PA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P G 
Sbjct: 152 GAAGE PGKAGERGV PG PPG- A VGP AGK DGEAGAQG P PG PAG PAGE RGEQG PAGS PG F 207 
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-PGPAG 260 



Query: 63 QLAERNG R P RRH RGALAQPGH PGDLAAG VGRGAGGGHS RRGRHHH VRS LA DLLQLPGAAE 122 

Q P G + G PGDL A G G RG R + PG A 

Sbjct: 208 QGLPGPAGPPGEACKPGEQGVPGDLGAP— GPSGARGERGFPGE-RGVEGP- 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

GGPGD + PG+GP 
Sbjct: 261 PRG ANG - A PGN DGAKG DAGA PGAP — GS QGAP 289 

Score - 100 (15.0 bits), Expect «'l.9e-02, P - 1 . 9e-02 
Identities - 40/130 (30%), Positives = 48/130 (36%) 



7 G EAGG PGAAWARRAAAL PGT - - AAG P P RPAAP PGAA PA RG - -GPA - - PGAPAQAL PRSQR 60 
G G PG + PG A+GP P PPG GGAPGP+P + 

29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

61 G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P + HRG G GO +G G G + I* 

89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRG-LPGF 147 

118 PGAAEGAGDRG 128 

PG AG+ G 
14 8 PGPKGAAGEPG 158 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct; 

Score - 99 (14.9 bits). Expect - 2.5e-G2, P « 2.5e-02 
Identities - 53/156 (33%), Positives - 61/156 (39*) 



Query: 
Sbjct: 



Query: 
Sbjct: 



Query: 
Sbjct: 



7 GEAGGPGAAWARRA— AALPGT— AAG PPRPAAP PGAA PARG— GPA PGAPAQAL 55 

G G PGA R A PG AGPPPG+RG GPA P PA A 

587 G R DGSPGAKGDRG ETG PAGAPG P PGAPGA PGPVG PAGKS GDRGETG PAG P I G P VGP AGAR 646 

56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

647 GPAGPQGPRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGP- 705 

109 RSLADLLQLPGAAEGAGDRG — HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

706 PGS AGS PGKDGLNGLPGPIG--P PG PRGRTGDAG PAG P P 742 



Score - 98 (14.7 bits). Expect - 3.3e-02, P - 3.3e-02 
Identities - 51/158 (32%), Positives = 58/158 (36%) 

Query: 7 G EAGG PGAAWARRAAAL PGT A AG PPRPAAP PGAAPARGG PAP-GAPAQALPRSQR 60 

G G G R AA LPG AGP PG RG P G P A + 

Sbjct: 287 GA PG LQGMPGERGAAGL PG P KG DRG DAG PKGADGA PGKDG V RGLTG P I G P PGP AGA PGDK 346 

Query: 61 GRQLAERNGRPRRHRGA— LAQPGHPG DLAAGVGRGAGGGHS RRGRHHH VRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

Sbjct: 347 GE — AGPSG- PAGTRGA PGD RGE PG PPG PAGFAG P PGADGQPGAKGE PG DAGAKGDAGP - 402 

Query: 118 PGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVR 159 

PGAAGG+ AP+R GGPAAR 
Sbjct: 403 PG PAG PAG P PG P I GNVGA PG P KGARGS AGP PGATG F PGAAGR 444 

Score - 96 (14.4 bits), Expect - 5.7e-02, P - 5.5e-02 
Identities = 46/152 (30%), Positives - 57/152 (37%) 

Query: 6 NGEAGGPGAAWARRAAALPGTAA — GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG AGPP PA++ R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

p RG G G+ +G G RG H R + L PG 

Sbjct: 634 AG PI GPVG PAGA RG PAG PQG PRG B KGZTGZZGBRG1KGH-RGFSGLQGPPGPPG 686 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 SPGEQG— PS-GASGP AG PRG P PGS A 709 

Score - 94 (14.1 bits), Expect - 9.7e-02, P - 9.2e-02 
Identities - 45/134 (33%), Positives - 56/134 (41%) 

24 PGTAAGP P R P AAP PGAA PARGG P A- PGAPAQAL P RSQRG RQLAE RNGRPRRH R- -GALAQ 80 

P GPP PG +G P PG P + P RG G P ++ G + 

21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 

81 PGHPGDLAA-GV--GRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGDRGH--LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

7 6 PGRPGERG P PG PQGARGL PGT AG L PGMKGH - RG FSGLDGAKG DAGPAG PKGE PGS PGENG 134 

136 RDPEL-PRVFLPLAGLRGPPAAA 157 
+ + PR LP G GP AA 



Query: 
Sbjct: 



Query: 
Sbjct: 



Query: 
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Sbjct: 



135 APGQMGPRG-LP— GFPGPKGAA 154 



Score » 92 (13.8 bits), Expect - 1.7e-01, P - 1.5e-01 
Identities - 52/155 (33%), Positives - 58/155 <37%> 



7 G CAGG PGAAWARRAAAL PGTAAG P P RP AAPPGAAPARGG P- A PGAP AQAL PRSQRGRQLA 65 
GEAG G A R A G GPP PA G AGPAGP A + G 
347 GEAGPSGPAGTRGAPGDRGEP-GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGP 405 

66 E RNGRP RRHRG ALAQPGH PGDLAAGVGRGAGGGHS RRGR - -H HH VRS LADLLQL PGAA - - 121 
P G + PG G + GA G GR A PG A 

406 AGP AG P PG P I GNVGA PG P KGARG S AG P PGATG F PGAAG RVG P PGPSGNAG P PG PPG PAG K 4 65 

122 EGA- GDRG HL PGP DARDPEL P RV FL P - LAGLRGPP AA 156 

EG+ G RG GP R E+ P AG +G P A 

466 EGS KG P RG ET -G PAGR PGEVG P PG P PG PAGEKGAPGA 501 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 92 (13.8 bits). Expect » 1.7e-01, P - 1.5e-01 
Identities » 51/156 (32%), Positives - 57/156 (36%) 

Query: 7 GEAGG PGAAWARRA AAL PGT - - AAGP PRPAA P PGAA PARGG P APGAPAQAL - PRSQR 60 

G G PGA R A PG AGPPPG+RG PP +P R 
Sbjct: 587 GROGS PGAKGD RGETG PAGAPG P PGA PGA PG PVG PAGKSGDRGETG PAG PIG PVG PAGAR 646 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLA-AGVG— RGAGGGHSRRGRH--HHVRSLADLL 115 

GAGPR+G+GG G+GG G A 

Sbjct: 647 GP— AGPQG-PRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR 703 

Query: 116 QLPGAAEGAGDRG— HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 704 GPPGSAGSPGKDGLNGLPGPIG--PPGPRGRTGDAGPAGPP 742 

Score - 90 (13.5 bits), Expect - 2.8e-01, P = 2.5e-01 
Identities - 45/134 (33%) , Positives - 53/134 (39%) 



7 GEAGG PGAAWA RRAAAL PGTAAG P PRPAA P PGAA PARGG PAPGA P AQAL P RSQRGRQ- LA 65 
G G PG A + A GA P P PGA RG GPQ R +RG L 
485 G P PG P PG PAGEKGA PGADGPAGAPGT PG - PQG I AGQRG- - VVGLPGQ- — RGERGFPGLP 538 

66 ERNGRPRRH - - RGALAQPGH PGDLA AGV GR-GAGGGHSRRGRHHHVRS LADL 114 

+G P + GA + G PG + AG GR GA G GR + D 

539 GPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRDGSPGAKGDR 598 

115 LQL- PGAA EGAGDRGHL PGP 133 

+ PAG PGP 
599 GETG PAGA PG P PGA PGA PG P 618 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 83 (12.5 bits). Expect - 1.8e+00, P - 8.3e-01 
Identities - 49/156 (31%), Positives - 56/156 (35%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA— GPPRPAAPPGAAPARG— GPAP— GAPAQAL PRSQR 60 

G+AG GA A + G GPP PA PG G GPA GAP R + 

Sbjct: 311 GDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDKGEAGPSGPAGTRGAPGD RGEP 367 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRS LAOLLQLPGA 120 

G P G G PGD A G G G + ++ PG 
Sbjct: 368 G P PG PAG FAG P PGADGQPGAKG E PGDAGAKG DAG P PG PAG PAG P PG P I GN VG APGP 423 

Query: 121 AEGAGDRGHLPGPDARDPELPRVFLP LAGLRGP P AAAV RE 160 

G G PG RV P AG GPP A +E 

Sbjct: 424 KGARGS AG P - PGATG FPGAAG RVG PPGP SGNAG P PGP PG PAGKE 466 

Score » 82 (12.3 bits). Expect - 2.3e+00, P - 9.0e-01 
Identities - 46/148 (31%), Positives - 52/148 (35%) 

Query: 7 GEAGG PGAAWARRAAAL PGTAAG P PR P AA P PGAA PARGG PA PGAPAQA L P RSQRGRQLAE 66 

G+AG PGA ++ALGG A PG RGPAP RL 
Sbjct: 275 GDAGAPGAPGSQGAPGLQGMP-GERGAAGLPGPKGDRGDAGPKG-ADGAPGKDGVRGLTG 332 

Query: 67 RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 126 

G P G PG G+ G G RG A PGA G 

Sbjct: 333 PIGPP G PAGA PGDKGEAGPSGPAGT RGA PGORG EPG PPG P - AG FAG P PGADGQPGA 387 

Query: 127 RGHL PGP- DARDPELPRVFLP LAGLRGPP 154 

+G PG A+ P P AG GPP 
Sbjct: 388 KGE- PGDAGAKG DAGP PG - - P - AG PAG P P 412 



Peptide information for frame 3 
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ORF from 12 bp to 755 bp; peptide length: 248 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE ZIPPER (17-39) 
LEUCINE ZIPPER (24-46) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 3 

TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N » 1, Score 
- 135, P - le-06 

TREMBL:HS6802_1 gene: "dJ6802.1"; product: "dJ6802 . 1" ; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22. Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N - 1, Score - 107, 
P - 0.0023 



>TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo sapiens 
TNF-inducible protein CG12-1 mRNA, complete cds. 
Length - 331 

HSPs: 



Score = 135 (20.3 bits), Expect - 1.0e-06, P - 1.0e-06 
Identities - 30/103 (29%), Positives - 55/103 (53%) 

Query: 30 RLHRQVLRLREVARRLERLRRRSLvANVAGSSLSATGALAAIVGLSLSPVTLGTSLLVSA 89 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKLRALANGIEEVHRGCTISNWSSSTGAASGIMSLAGLVLAPFTAGTSLALTA 150 

Query: 90 VGLGVATAGGAVTITSDL-SLIFCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S B + AT D+++ 

Sbjct: 151 AG VGLGAAS AVTG ITTS I VEHS YT S S AEAE- AS RLT ATS I DRLK 193 

Pedant information for DKFZphtes3_18f 3, frame 2 



Report for DKFZphtes3_18f 3 . 2 

(LENGTH) 193 
[MW] 19708.24 
(pi] 11.90 
[KW] All_Alpha 

[KW] LOW_COMPLEXITY 55.44 % 

SEQ T E VNGNGEAGG PGAAWARRAAALPGTAAG P P RPAA P PGAA PARGGPAPG A P AQAL PRSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. ■ ■ 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

SEQ AEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQFCLLHRLLWLTW 

SEG xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccc 



(No Prosite data available for DKFZphtes3_18f3 . 2) 
(No Pfam data available for DKFZphtes3_18f3 .2) 

Pedant information for DKFZphtes3_18f 3, frame 3 
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Report for DKFZphtes3_lSf 3 . 3 



[LENGTH] 248 

[MWJ 27162.56 

ipl] 9.92 

[PROSITE] LEUCINE_ZIPPER 2 

(KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 30.65 % 

( KW] COILED_COIL 12.10 % 

SEQ MGMERPAAREPHGPDALRRFQGLLLDRRGRLHRQVLRLREVARRLERLRRRSLVANVAGS 

SEG XXXXXXXXXXXXXXXXXX , xxxxxxxxxxxxxxxxxxxx . . XXX 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 

COILS 

MEM 

SEQ SLSATGALAAIVGLSLSPVTLGTSLLVSAVGLGVATAGGAVTITSDLSLIFCNSRELRRV 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx 

PRO cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 

SEQ QEIAATCQDQMREILSCLEFFCRWQGCGDRQLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 



COILS 

MEM 

SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS cccccccccccccccccccccccccccccc 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 



Prosite for DKFZphtes3_18f 3 . 3 

PS00029 17->39 LEUCINE_ZIPPER PDOC00029 

PS00029 24->46 LEUCINE_ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_18f 3 . 3> 
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DKFZphtes3_1817 



group: cell structure and motility 

DKFZphtes3_lB17 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins . 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat. 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-baaed membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 



similarity to ankyrins 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 4501 bp 

Poly A stretch at pos. 4423, no polyadenylation signal found 



1 GATCGCCGCG CGAGGGTGGT GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
51 AGGTGCCGCC GTCGCCCAGG ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
101 TAAGAGATTT GCTCTGACCC ATCTGAAGTC CATATGGCTC TGTATGATGA 
151 AGACCTCCTG AAAAATCCTT TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
201 ACTTGTGCAG CAAAGTGGCC CAAATCCATG GCATTGTCTT AGTACCCTGC 
251 AAAGGAAGCC TGTCGAGCAG CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
301 CATTTTGATA CCTGTGGAAG AGCATTTTCA GACCTTAAAT GGAAAGGATG 
351 TCTTTATTCA AGGGAACAGG ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
401 CTCTCAGTGC CCATTCTCTT TGAAGAAACT TTCTACAATG AAAAAGAAGA 
451 GAGTTTCAGC ATCCTGTGTA TAGCCCATCC TTTGGAAAAG AGAGAGAGTT 
501 CAGAAGAGCC TTTGGCACCC TCAGATCCCT TTTCCCTGAA AACCATTGAA 
551 GATGTGAGAG AGTTCTTGGG AAGACACTCC GAGCGATTTG ACAGGAACAT 
601 CGCCTCTTTC CATCGAACAT TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
651 ACCACATAGA CTCAGCGAAT GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
701 CTGAGGGACT CTCACCTGAA AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
751 CCTGATGAAG CAGGCAGTGG AGATATACGT CCATCATGAA ATTTACAACC 
801 TGATCTTTAA ATACGTGGGG ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
851 AACAAAATCA CAAGAAGCCT TCAAGATCTT CAGCAGAAAG ATATTGGTGT 
901 GAAACCGGAG TTCAGCTTTA ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
951 AGCTGAACAA ATGCACCTCC CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
1001 GTGGTGCAGC TCATTACACA GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
1051 CATGTGTGCT GATGATCTGC TATCAGTCCT GTTATACTTG CTTGTGAAAA 
1101 CGGAGATCCC TAATTGGATG GCAAATTTGA GTTACATCAA AAACTTCAGG 
1151 TTTAGCAGCT TGGCAAAGGA TGAACTGGGA TACTGCCTGA CCTCATTCGA 
1201 AGCTGCCATT GAATATATTC GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
1251 AGTCTGAGGG ATTTGGAGAC AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
1301 CTCTCTCAGA TGACTTCGTC TCCCACCGAC TGCCTGTTTA AGCACATTGC 
1351 ATCAGGTAAC CAGAAAGAAG TGGAGAGACT TCTGAGCCAA GAGGACCATG 
1401 ATAAAGATAC CGTCCAAAAG ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
1451 TGTGAGAAAC TCGTCTCTGG GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
1501 ATTCTCCAGA GACGACAGGG GGCACACCCC TCTCCATGTG GCTGCTGTCT 
1551 GTGGGCAGGC ATCCCTCATC GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
1601 AATGCCACAG ACTACCATGG GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
1651 GGGCTACCAG AGCGTGACGC TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
1701 AAGTGCAGGA CAACAATGGG AATACGCCAC TCCACCTGGC CTGCACCTAC 
1751 GGCCACGAGG ACTGTGTGAA GGCTCTGGTT TACTACGACG TGGAGTCGTG 
1801 CAGACTTGAC ATTGGCAATG AGAAAGGAGA CACCCCTCTA CACATTGCTG 
1851 CCCGCTGGGG CTACCAAGGC GTCATAGAGA CATTGCTGCA GAACGGAGCG 
1901 TCCACCGAGA TCCAGAACAG ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
1951 AAACTCAAAG ATTCTGTCTG TAATGGAAGC CTATCACCTG TCCTTCGAGA 
2001 GGAGGCAGAA GTCGTCCGAG GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
2051 GACTCCATCA GCCAAGAGTC CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
2101 CGGCTCAAGG CAGGAGGAGA CCAAGAAGGA CTACAGAGAG GTAGAAAAAC 
2151 TTTTGAGAGC AGTTGCTGAT GGAGATCTAG AAATGGTGCG TTACCTGTTG 
2201 GAATGGACAG AGGAGGACCT GGAGGATGCG GAGGACACTG TCAGTGCAGC 
2251 AGACCCCGAA TTCTGTCACC CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
2301 CTCAGAAGAG GCTGGCGAAG GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
2351 ACCAGCCAGG ACGGCTCCTC CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
2401 GGCGGACCTC ATCCGCCTCC TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
2451 GGAACGCAGA CCAAGCCGTC CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
2501 TTTCAGGTGG TGAAGTGTCT GTTAGATTCG AATGCAAAAC CCAATAAGAA 
2551 GGACCTCAGT GGAAACACGC CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
2601 ACGAGCTTGT GGCACTGCTG CTACAGCACG GGGCCTCCAT TAACGCTTCT 
2651 AACAATAAGG GCAACACAGC GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC 
2751 ACAAGCGGCA GCGCACGGCT 
2801 ATGGAATTGC TTCAGGTGGT 
2851 GGCTGAAACT GACCGCAAGG 
2901 GGAACTCAAA ACTGTATGAT 
2951 TACTTTGTCC ACTCAGCTGG 
3001 TATGGCAAGA GATAGAAGTG 
3051 AGCCAGGGAG GCAAAGTGTC 
3101 AGTGGATCTC ATGCTGCTGA 
3151 TGGACTGACA CAGACTGGCC 
3201 CGGTAGAGGA TGCGGTCGTG 
3251 TCCACTCCCC AAGAGGTTAG 
3301 TGTTGAACCC ACTGCTAGGA 
3351 TGAACACATC TGAGAACTAA 
3401 CTTCAGCACC AAGTTCCTGA 
3451 AAAAAAGTTA ACCACCACCA 
3501 ATTGAAACAG ACAAAAATTC 
3551 GCATGCTTCT TTTTAAGTAT 
3601 TCACCACCGC ATTCTGACCT 
3651 ACCTGTGTAC ATTCACAAAC 
3701 GCTGGAGAGA AGTAAGTAAT 
3751 TGAAATGTCA TATCTGAAGG 
3801 GCAAAGCAAC ACTCGAACCA 
3851 TTTTAGTGAA AGGATGCATC 
3901 GGGTGGTTAT CATTTTCCTT 
3951 ACACGTGCAC CTGTAGCAGT 
4001 CCTCCCTTGA ATGTCTGTCA 
4051 TAGAGAGTAG ATTTGGCACA 
4101 AACTTAACAG CACAAACCAG 
4151 CCATTTATTC CTTTTTATAA 
4201 TTATTGGCCT AGAGCTACAC 
4251 AATGACCTTG TGATAGGGAA 
4301 GTGTATGTAC AGAAGGAAGG 
4351 GATTTCTAAT TTTCTAATGT 
4401 AAACAGTAAA CTTTATGATT 
4451 AAAAAAAAAA AAAAAAAAAA 
4501 G 



TGCTCCACGG AGCGTCAGTT CAGGTGCTGA 
GTAGACTGTG CTGAACAGAA TTCAAAAATA 
ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
CTACCAGATG AGCCTTTTAC AAGACAGTTT 
TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
TCCCTAATTT AACCGAAGGT TCTTTGCATG 
ACACTGAGAC AGAATAACCT GCCAGCTCAG 
GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
CTGGACACAG ACGGATGCTG CGGAGACACA 
TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
AGCAAGGATG CAACAAGATG ATGCTGAGCG 
ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
TCTCTCTCCT CTTCAAAGCT AATGAATACA 
CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
AAAGATGCCT CAATCCCATT TTGATATTCA 
AGACCTGTTC CACATCATGC ACATGGGAAA 
CTAACAAGTA GGTACAGATA TTCGGTTACT 
ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
CACTCACACC TGACGGGATG GTTACTGGAT 
TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
ATTTCTATAG ATTATACTGT TATTTTTATG 
GTATATGGGT TTGTCCTGAG TCCGTTTTCA 
ATGGTTTTGT CCATGTTCTT GGAAATACTT 
GAGGGATTAT TTTTCTACAA AGTAATTTAT 
GCCTTGGATA TGTGCCAAAT GATGGAAAAG 
CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 



BLAST Results 



Ho BLAST result 



•Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (945-953) 



1 MALVDEDLLK NPFYLALQKC RPDLCSKVAQ IHGIVLVPCK GSLSSSIQST 
51 CQFESYILIP VEEHFQTLNG KDVFIQGNRI KLGAGFACLL SVPILFEETF 
101 YNEKEESFSI LCIAHPLEKR ESSEEPLAPS DPFSLKTIED VREFLGRHSE 
151 RFDRNIASFH RTFRECERKS liRHHIDSANA LYTKCLQQLL RDSHLKMLAK 
201 QEAQMNLMKQ AVEIYVHHEI YNLIFKYVGT MEASEDAAFN KITRSLQDLQ 
251 QKDIGVKPEF SFNI PRAKRE LAQLNKCTSP QQKLVCLRKV VQLITQSPSQ 
301 RVNLETMCAD DLLSVLLYLL VKTEIPNWMA NLSYIKNFRF SSLAKDELGY 
351 CLTSFEAAIE YIRQGSLSAK PPESEGFGDR LFLKQRMSLL SQMTSSPTDC 
401 LFKHIASGNQ KEVERLLSQE DHDKDTVQKM CHPLCFCDDC EKLVSGRLND 
451 PSVVTPFSRD DRGHTPLHVA AVCGQASLID LLVSKGAMVN ATDYHGATPL 
501 HLACQKGYQS VTLLLLHYKA SAEVQDNNGN TPLHLACTYG HEDCVKALVY 
551 YDVESCRLDI GNEKGDTPLH IAARWGYQGV IETLLQNGAS TEIQNRLKET 
601 PLKCALNSKI LSVMEAYHLS FERRQKSSEA PVQSPQRSVD SISQESSTSS 
651 FSSMSAGSRQ EETKKDYREV EKLLRAVADG DLEMVRYLLE WTEEDLEDAE 
701 DTVSAADPEF CHPLCQCPKC APAQKRLAKV PASGLGVNVT SQDGSSPLHV 
751 AALHGRADL.I RLLLKHGANA GARNADQAVP LHLACQQGHF QWKCLLDSN 
801 AKPNKKDLSG NTPLIYACSG GHHELVALLL QHGASINASN NKGNTALHEA 
851 VIEKHVFWE LLLLHGASVQ VLNKRQRTAV DCAEQNSKIM ELLQVVPSCV 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_1817, frame 2 

TREMBL: HSU4 3965 1 gene: "ANK3"; product: "ankyrin G119"; Human ankyrin 
G119 (ANK3) mRNA, complete cds., N - 2, Score - 287, P - 3.7e-21 

PIR: 149502 ankyrin - mouse, N - 3, Score - 365, P - 2.2e-27 

TREMBL : HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 

ankyrin (variant 2.1), N' - 2, Score - 380, P • 7.3e-31 

SWISSPR0T:ANK1_HUMAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) ( ERYTHROCYTE 
ANKYRIN) . , N - 2, Score - 380, P - 8.2e-31 

PIR:SJHUK ankyrin 1/ erythrocyte splice form 1 - human, N - 2, Score » 
380, P - 8.2e-31 



> TREMBL : HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length - 1,719 



HSPs: 

Score - 380 (57.0 bits), Expect - 7.3e-31, Sura P(2> - 7.3e-31 
Identities - 139/447 (31%), Positives ■ 207/447 (46%) 



Query: 


4 62 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATD*HGATPLHLACQKGYQSVTLLLLHYKAS 


521 




+G+T LH+AA+ GQ ++ LV+ GA VNA G TPL++A Q+ + V LL A+ 




SbjCt: 


77 


KGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGAN 


136 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVES-CRL 


558 






V +G TPL +A GHE+ V L+ Y + RL 




Sbjct: 


137 


QNVATEDGFTPLAVALQQGHENWAHLINYGTKGKVRLPALHIAARNDDTRTAAVLLQND 


196 


Query: 


559 


DIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 


615 




D+ ++ G TPLHIAA + V + LL GAS + TPL A S+ +V+ 




Sbjct: 


197 


PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTPQNGITPLHIA--SRRGNVIM 


254 


Query: 


616 


AYHLSFERRQKSSEAPVQSPQRSVDSISQESSTS-SFSSMSAGSR-QEETKKDYREVEKL 


673 




L +R + E + + ++ S + G+ Q +TK + 




Sbjct: 


255 


V-RLLLDRGAQI-ETKTKDELTPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHM- 


311 


Query: 


674 


LRAVADG D- L EMV RY L L EWTEE DLEDAEDT VS AADPE FCH P LCQC P KC APAQKRLAKV PA 


732 






A GD L+ VR LL++ E ++D T+ P H C R+AKV 




Sbjct: 


312 


AAQGDHLDCVRLLLQYDAE-IDDI — TLDHLTP — LHVAAHC GHHRVAKVLL 


358 


Query: 


733 


S-GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQ 


791 




G N + +G +PLH+A + + LLLK GA+ A PLH+A GH 




Sbjct: 


359 


DKGAKPNSRALNGFTPLHI ACKKNHVRVMELLLKTGASIDAVTESGLTPLHVAS FMGHLP 


418 


Query: 


792 


VVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAV 


851 




+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 




Sbjct: 


419 


I VKN LLQRGAS PN VSNV KVET PLHMAA RAGHT £ VAK YL LQN KAKVN AKAKD DQT PLHCAA 


478 


Query: 


852 


IEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVV 896 








H +V+LLL + A+ + T + A + + +L ++ 




Sbjct: 


479 


RIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALL 523 




Score 


- 378 


(56.7 bits), Expect - 1.2e-30, Sum P(2J - 1.2e-30 




Identities * 


■ 130/447 (29%), Positives - 195/447 (43%) 




Query: 


465 TPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 


524 




TPLH AA G + ++L+ GA + A +G +P+H+A Q + LLL Y A + 




Sbjct: 


274 


TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHKAAQGDHLDCVRLLLQYDAEIDD 


333 


Query: 


525 




557 




+ T PLH+A GH K L+ + +C+ 




Sbjct: 


334 


ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 


393 


Query: 


558 


— LDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 


614 




+D EG TPLH+A+ G+ +++ LLQ GAS + N ETPL A + V 




Sbjct: 


394 


GASIDAVTESGLTPLHVASFMGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVA 


453 
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Query: 


615 


EAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLL 


674 




+ Y L + + + Q+P I + +A T L 


506 


Sbjct: 


454 


K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 


Query: 


675 


RAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPASG 


734 




A +G +E V LLE ++AT PH+KA+L + 




Sbjct: 


509 


IAAREGHVETVLALLE KEASQACMTKKGFTP — LHVAAKYGKVRVAELLLER D 


559 


Query: 


735 


LGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVK 


794 




N ++G +PLHVA H D+++LLL G + + + PLH+A +Q +V + 




Sbjct: 


560 


AHPNAAGKNGLTPLHVAVHHUNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVAR 


619 


Query: 


795 


CLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEK 


854 




LL N + + G TPL A GH E+VALLL A+ N N G T LH E 




Sbjct: 


620 


S LLQ YGGS AN AES VQGVTP L H LAAQEGHAEMVALL LS KQANGNLGN KS GLT PLHLVAQEG 


679 


Query: 


855 


HVFVVELLLLHGASVQVLNKRQRTAVDCAEQ — NSKIMELL 893 






HV V ++L+ HG V + T + A N K+++ L 




Sbjct: 


680 


HVPVADVLIKHGVHVDATTRMGYTFLHVASHYGNIKLVKFL 720 




Score 


- 367 


(S5.1 bits). Expect - 1.8e-29, Sum P(2) » 1.8e-29 




Identities - 131/489 (26%), Positives - 210/489 (42%) 




Query: 


404 


HIAS--GNQKEVERLLSQEDHDKDTVQKMCHPL-CFCDDCEKLVSGRLNDPSVVTPFSRD 


460 




HIAS GN V LL + + + PL C + +S L D ++ 




Sbjct: 


244 


HIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRISEILLDHGAPIQ-AKT 


302 


Query: 


461 


DRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKA 


520 




G +P+H+AA + LL+ A ++ TPLH+A G+ V +LL A 




Sbjct: 


303 


KNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHCGHHRVAKVLLDKGA 


362 


Query: 


521 


SAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGV 


580 




+ NG TPLH+AC H ++ L+ +D E G TPLH+A+ G+ + 




Sbjct: 


363 


KPNSRALNGFTPLHIACKKNHVRVMELLLK TGASIDAVTESGLTPLHVASFMGHLPI 


419 


Query: 


581 


IETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQSPQR 


637 






++ LLQ GAS + N ETPL A ++++ + + K + P+ R 




Sbjct: 


420 


VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAAR 


479 


Query: 


638 


SVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTE 


693 




++ + £++ + + +AG VE +L + + +T 




Sbjct: 


480 


I GHTNMVKLLLENNANPNLATTAGHTPLH IAAREGHVETVLALLE KEASQACMTKKGFTP 


539 


Query: 


694 


EDLEDAEDTVSAAD PEFCHPLCQ CP-KCAPAQKRLAKVPA SGLGVNVTS 


741 




+ VA+ HP PALV G+ + 




Sbjct: 


540 


LH VAAK YGKVRVAEL LL E RDAH PNAAG KNG LTPLHVAVHHNNLD I VKLLL PRGGSPHS PA 


599 


Query: 


742 


QDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNA 


801 




+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct : 


600 


WNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQA 


659 


Query: 


802 


KPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVEL 


B61 




N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ 




Sbjct: 


660 


NGNLGNKSGLTPLHLVAQEGHVpVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKF 


719 


Query: 


862 


LLLHGASVQVLNK 874 








LL H A V K 




Sbjct: 


720 


LLQHQADVNAKTK 732 




Score 


- 345 


(51.8 bits), Expect = 4.2e-27, Sum P(2) - 4.2e-27 




Identities * 


* 146/506 (28%), Positives - 233/506 (46%) 




Query: 


404 


HIAS--GNQKEVERLLSQEDHDKDTVQK MCHPLCFCDDCEKLVSGRLNDPSVVTPFS 


458 






H+AS G+ K V LL +E + T +K H + ++V +N + V + 




Sbjct: 


50 


HLAS KEGH VKMVV EL LHKE 1 1 LETTT KKGNTALH I AALAGQ- DEV V RELVN YGANVN - - A 


106 


Query: 


459 


RDDRGHTPLHVAAVCGQASLIDLLVSKGAMVWATDYHGATPLHLACQKGYQSVTLLLLHY 


518 




+ +G TPL++AA + + L+ GA N G TPL +A Q+G+ ++V L++Y 




Sbjct: 


107 


QSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVALQQGHENWAHLINY 


166 


Query: 


519 


KASAEVQDNNGNTP-LHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGY 


577 




+V+ P LH+A ++D A V + D+ ++ G TPLHIAA + 




Sbjct: 


167 


GTKGKVR LPALHIAAR--NDDTRTAAVLLQNDP-NPDVLSKTGFTPLHI AAHYEN 


218 


Query: 


578 


QGVIETLLQNGASTEIQNRLKETPLKCAL— NSKILSVMEAYHLSFERRQKSSEAPVQS 


634 




V + LL GAS + TPL A N ++ ++ E + K P+ 




Sbjct: 


219 


LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHC 


278 


Query: 


635 


PQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGD-LEMVRYLLEWTE 


693 




R+ . E + + A +TK + A GD L+ VR LL++ 




Sbjct: 


279 


AARNGHVRISEILLDHGAPIQA KTKNGLSPIHM AAQGDH LDC V RLLLQY DA 


329 



643 



WO 01/12659 



PCT/IBOO/01496 



Query: 


694 




729 




E ++D D++ CH++ P C R+ + 




Sbjct: 


330 


E-IDDITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVME 


388 


Query: 


730 


VPA-SGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQG 






+ +G ++ ++ G +PLHVA+ G +++ LL+ GA+ N PLH+A + G 




Sbjct: 


389 


LLLKTGASIDAVTESGLTPLHVASFttGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAG 


446 


Query: 


789 


HFQVVKCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALH 


848 




H +V K LL + AK N K - TPL A GH +V LLL++ A+ N + G+T LH 




Sbjct: 


449 


HTEVAK Y LLQN KAK VHAKAK DDQT PLHCAAR I GHTNMVKLLLENNAN PN LATT AGHT P LH 


508 


Query: 


849 


EAVIEKHVFWELLLLHGASVQVLNKRQRTAVDCAEQNSKIM--ELL 893 






A E HV V LL AS + K+ T + A + K+ ELL 




Sbjct: 


509 


IAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELL 555 




Score 


- 243 


(36.5 bits), Expect » 1.6e-14, Sum P(2) - 1.6e-14 




Identities - 64/199 (32%), Positives - 97/199 (48*) 




Query: 


404 


HIAS--GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 


461 




H+A+ G + E LL ++ H + PL L +L P +P S 




Sbjct: 


541 


HVAAKYGKVRVAELLLERDAK PNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAW 


600 


Query: 


462 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 




G+TPLH+AA Q + L+ G NA G TPLHLA Q+G+ + LLL +A+ 




Sbjct: 


601 


NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGRAEMVALLLSKQAN 


660 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 


581 




+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGV---MVDATTRMGYTPLHVASHYGNIKLV 


717 


Query: 


582 


ETL LQNGAST E IQN RLKET PL 602 








+ LLQ+ A + +L +PL 




Sbjct: 


718 


KFLLQHQADVNAKTKLGYSPL 738 




Score 


- 242 


(36.3 bits). Expect - 5.0e-29, Sum P(2) - 5.0e-29 




Identities ■ 


■ 63/176 (35%), Positives = 92/176 (52%) 




Query: 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADOAVPLHLACQQGHEQW 


793 




G VN T Q+G +PLH+A+ G ++RLLL GA + D+ PLH A + GH ++ 




Sbjct: 


229 


GASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRIS 


288 


Query: 


794 


KCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 


853 




+ LLD A K +G+P+ A GH+V LLLQ+ A 1+ T LH A 




Sbjct: 


289 


EILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHC 


348 


Query: 


854 


KHVFWELLLLHGA — SVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAET 909 




H V ++LL GA + + LN + C + + ++MELL AS+D V E+ 




Sbjct: 


349 


GHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVKELLLKTG ASIDAVTES 403 


Score 


- 242 


(36.3 bits), Expect - 3.3e-14, Sum P(2) » 3.3e-14 




Identities » 80/284 (28%) , Positives - 129/284 (45%) 




Query: 


404 


HI AS — GNQK EVE RLLSQE DH DK DT VQKMC HPLC FC DDC E KLV SGRLND PS VVT P FS RDD 


461 






HIA+ G+ + V LL +E +K PL K+ L P + 




Sbjct: 


508 


HIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGK 


567 


Query: 


4 62 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 




G TPLHVA ++ LL+ +G ++ ++G TPLH+A ++ V LL Y S 




Sbjct: 


568 


NGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGS 


627 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 


581 




A + G TPLHLA GH+V L+ ++GN+ G TPLH+ A+ G+ V 




Sbjct: 


628 


ANAESVQGVTPLHLAAQEGHAEMVALLLSKQANG NLGNKSGLTPLHLVAQEGHVPVA 


684 


Query: 


562 


ETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPV-QSPQR 


637 




+ L+++G + R+ TPL A N K++ + + + K +P+ Q+ Q+ 




Sbjct: 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


744 


Query: 


638 


S-VDSISQ — ESSTSSFSSMSAGSRQEETKK--DYREVEKLLRAVAD 679 






D ++ ++ S S G+ K Y V +L+ V D 




Sbjct: 


745 


GHTDI VTLLLKNGASPNEVSSDGTTPLAI AKRLGYISVTDVLKWTD 791 





Score - 235 (35.3 bits). Expect - 7.9e-34, Sum P(2) = 7.9e-34 
Identities - 58/165 (35%) , Positives - 83/165 (50%) 



Query: 7 34 GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 793 

G N S G +PLH+AA G A+++ LLL AN H PLHL Q+GH V 

Sbjct: 625 GGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEGHVPVA 684 



644 



WO 01/12659 



PCT/1B00/01496 



Query: 794 KCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 853 

L+ + G TPL A G+ +LV LLQH A +NA G + LH+A + 

Sbjct: 685 DV L I KHGVMV DATT RMG YT P LH VAS H YGN I KLVK FL LQH QADVNAKT KLG Y S PLHQAAQQ 744 

Query: 854 KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS--KIMELLQW 896 

H +V LLL +GAS ++ T + A+ + + ++L+VV 

Sbjct: 745 GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKW 789 

Score - 233 (35.0 bits), Expect - 7.9e-34, Sum P(2) - 7.9e-34 
Identities - 67/202 (33%) , Positives = 100/202 (49%) 

HIAS-GNQKEVERLLSQEDHDKDTVQKMCH--PLCFCDDC-EKLVSGRLNDPSVVTPFSR 459 
H+A+ G+ + RLL QD+D++HPL C V+LD PSR 



+++LL+ GA ++A G TPLH+A G+ + 



Query: 


404 


Sbjct: 


310 


Query: 


460 


Sbjct: 


368 


Query: 


520 


Sbjct: 


426 


Query: 


580 


Sbjct: 


485 


Score 


- 226 


Identities ■ 


Query: 


743 


Sbjct: 


601 


Query: 


803 


Sbjct: 


661 


Query: 


863 


Sbjct: 


721 


Score 


- 198 


Identities ■ 


Query: 


737 


Sbjct : 


71 


Query: 


797 


Sbjct: 


131 


Query: 


857 


Sbjct: 


187 


Score 


- 186 


identities ■ 


Query : 


463 


Sbjct: 


503 


Query: 


523 


Sbjct: 


563 


Que ry : 


583 


Sbjct: 


620 


Score 


- 182 



TPLH+A GH + K L+ +++ + TPLH AAR G+ 



+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 

NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 

PNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVELL 
N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 

GNLGNKSGLTPLHLVAQEGHVPVADVLI KHGVMV DATT RMG YTPLHVASHYGNIKLVKFL 

LLHGASVQVLNKRQRTAVDCAEQ--NSKIMELL 893 
L H A V K ++AQ++I+LL 
LQHQADVNAKTKLGYSPLHQAAQQGHTDIVTLL 753 

(29.7 bits). Expect - 2.5e-ll, Sum P(2) - 2.5e-ll 
■ 51/157 (32%), Positives - 82/157 (52%) 

VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQWKCL 
+ T++ G++ LH+AAL G+ +++R L+ +GAN A++ PL++A Q+ H +WK L 

LETTTKKGNTALH I AALAGQDE VVRELVN YGANVN AQSQKG FT PLYMAAQENHLE WK FL 

LD5NAK PNKKDLSGNTPLI YACSGGHHELVALLLQHGAS I NASNNKGNTALHEAV I EKHV 
L++ A N G TPL A GH +VA L+ +G ALH A 

LENGANQNVATEDGFTPLAVALQQGHENWAHLINYGTK GKVRLPALHIAARNDDT 

FVVELLLLHGASVQVLNKRQRTAVDCAE— QNSKIMELL 893 
+LL + + VL+K J T + A +N + +LL 



(27.9 bits). Expect - 6.6e-29, Sum P(2) 
55/143 (38%), Positives - 68/143 (47%) 



GHTPLH+AA G + L+ K A G TPLH+A + G 



NG TPLH+A + + D VK L+ S N G TPLHIAA+ 



+LLQ G S ++ TPL A 
SLLQYGGSANAESVQGVTPLHLA 642 

(27.3 bits), Expect - 2.9e-28, Sum P(2) - 2.9e-28 
Identities - 54/185 (29%), Positives - 89/18S (48%) 

Query: 738 NVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLL 797 

N+ ++ G +PLH+ A G + +L+KHG A PLH+A G+ ++VK LL 

Sbjct: 662 NLGNKSGLTPLHLVAQEGHVPVADVLI KHGVMV DATTRMGYTPLHVASHYGN I KLVKFLL 721 

Query: 798 DSNAKPNKKDLSGtTTPLI YACSGGHHELVALLLQHGAS I NASNNKGNTALHEAVIEKHVF 857 
A N K G +PL A GH ++V LLL++GAS N ++ G T L A ++ 



645 



WO 01/12659 



PCT/IB00/01496 



Sbjct: 722 QHQADVNAKTKLGYSPLHQAAQQGHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 781 

Query: 658 

Sbjct: 782 

Query: 9X8 

Sbjct: 828 



VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAETDRKEYVTV 917 
V + +L + V++ V+S PV.+ DV+E + +E ++ 

VTDVLKV VTDETSFVLVSDKHRMS FPETVDEILDVSEDEGEELISF 827 

KIRKK 922 
K ++ 
KAERR 832 



486 
35 

546 
95 

606 

152 



Score - 180 (27.0 bits), Expect - 5,0e-29, Sum P(2) - 5.0e-29 
Identities - 41/121 (33%), Positives - 67/121 (55%) 

GAMVN AT D YHGAT PLH LACQKG YQS VT LLLLH YKAS AEVQDNNGNT P LH LACT YGHEDCV 545 
G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 

GVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVV 94 

KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 605 
+ LV Y ++ ++KG TPL++AA+ + V++ LL+NGA+ + TPL A 

RELVN Y ™ - GANVNAQSQKGF'f PLYMAAQENH LEVVK FLLENGANQN V AT E DGFT PLAV A 151 

L 606 
L 

L 152 

(24.9 bits). Expect - 3.4e-06, Sum P(2) - 3.4e-06 
* 89/318 (27%), Positives - 140/318 (44%) 

448 LNDPSVVTPFSRDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKG 507 

L + + V ++DD+ TPLH AA G +++ LL+ AN G TPLH+A ++G 

457 LQNKAKVNAKAKDDQ-- TPLHCAARIGHTNMVKLLLENNANPNLATTACHTPLHI AAREG 514 

508 YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 552 

+ L LL +AS G TPLH+A YG + L+ D 

515 HVETVLALLEKEASQACMTKKGETPLHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLH 574 

553 --VESCRLDI GNE KGDTPLHIAARWGYQGVIETLLQNGASTEIQNRL 597 

V LDI G+ G TPLHIAA+ V +LLQ G S ++ 

575 VAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQ 634 

598 KETPLKCALNSKILSVMEAYHLSFERRQKSSEAPVQS PQRSVDSISQESSTSSFSSM-SA 656 

TPL A M A LS +Q + +S + ++QE + 

635 GVTPLHLAAQEGHAE-MVALLLS— KQANGNLGNKSGLTPLHLVAQEGHVPVADVLIKH 690 

657 GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 716 

G + T + LA G++++V++LL+ + D+ +A+ + + PL Q 

691 GVHVDATTR— MGYTPLHVASHYGNIKLVKFLLQH-QADV-NAKTKLGYS PLHQ 740 

717 CPKCAPAQKRLAKVPASGLGVNVTSQDGSSPLHVA 751 

+ + + +G N S DG++PL +A 

741 AAQQGHTDI - VTLLLKNGASPNEVSSDGTTPLAI A 774 

Score - 162 (24.3 bits), Expect » 1.8e-07, Sum P (2) - 1.8e-07 
Identities - 48/149 (32%), Positives - 71/149 (47%) 

737 VNVTSQDGSSPLHVAALHGRAnLIRLLLKKGANAGARNADQAVPLHLACQQGHFQVVKCL 796 
V D ++ AA G D L++G + N + LHLA ++GH ++V L 

5 VGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 64 

7 97 LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 856 
L GNT LA G E+V L+ +GA++NA + KG T L+ A E H+ 

65 LHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQEKHL 124 

857 FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 

VV+ LL +GA+ V + T + A Q 
125 EVVKFLLENGANQNVATE DG FT PLAV ALQ 153 

Score - 158 (23.7 bits), Expect - 5.7e-26, Sum P (2) = 5.7e-26 
Identities - 38/135 (28%), Positives - 65/135 (48%) 

460 DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 519 
+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 
42 NQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYG 101 

520 ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 579 

A+ Q G TPL++A H + VK L+ ++ E G TPL +A + G++ 

102 ANVNAQSQKG FT PLYMAAQENH LEVVK FLLE NGANQN V ATEDG FT PLAVALQQG HEN 158 

580 VIETLLQNGASTEIQ 594 

V+ L+ G +++ 
159 WAHLI NYGTKGKVR 173 



Query: 
Sbjct: 
Query: 
Sbjct: 
Que ry : 
Sbjct: 



Score - 166 
Identities « 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



646 



WO 01/12659 



PCT/IB00/01496 



Score - 115 (17.3 bits), Expect - l.Be-21, Sum P(2) - 1.8e-21 
Identities - 37/119 (31%), Positives - 58/119 (48%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



4 97 ATPLHLACQKGYQSVTLLLLHYKASAEVQ--DNNGNTPLHLACTYGHEDCVKALVYYDVE 554 
AT A + G ++ L H + ++ + NG LHLA GH V L++ ++ 
13 ATSFLRAARSG— NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEII 70 

555 SCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALKSKILSVM 614 
L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

71 — LETTTKKGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQEHHLEW 127 

615 E 615 
+ 

128 K 128 



Score « 106 (15.9 bits), Expect - 1.8e-01, Sum P(2) - 1.6e-0l 
Identities - 34/121 (28%), Positives - 54/121 (44%) 



769 NAGARNADQAVPLHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 
+GRADA A+G+ L+ N++G LA GH ++V 

4 SVGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

829 LLQHGASINASNNKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 
LL + + KGNTALH A + VV L+ +GA+V +++ T + A Q + 

64 LLH KE I ILETTTKKGNTALH I AALAGQDE VVRELVN YG ANVNAQSQKG FT PLYMAAQEN H 123 

869 I 889 
+ 

124 L 124 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 40 (6.0 bits), Expect - 1.6e-14, Sum P(2) - 1.6e-14 
Identities = 11/56 (19%), Positives - 23/56 (41%) 

Query: 622 ERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ+++Q++ Q++ +K++R V 

Sbjct: 1614 DRRQQGQE EQVQEAKNT FTQ VVQGN E FQN I PGEQVT EEQFT DEQGN I VT K K 1 1 RKV 1669 

Score - 36 (5.7 bits), Expect = 2,6e-14, Suro P(2) - 2.6e-14 
Identities - 6/12 (50%), Positives • 10/12 (83%) 

Query: 806 KDLSGNTPLIYA 817 

+D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 

Pedant information for DKFZphtes3_1817, frame 2 



Report for DKF2phtes3_1817 . 2 



[S 



[LENGTH] 
[MWJ 
tpl) 
[ HOMOL ] 
complete 
[FUNCAT] 
[ FUNCAT] 
3e-12 
[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

3e-08 

[FUNCAT] 

[FUNCAT) 

5e-05 

[ FUNCAT ] 

[FUNCAT) 

5e-05 

[FUNCAT] 

[FUNCAT] 

[ BLOCKS ) 

[SCOP] 

[EC] 

[PIRKW] 

[PIRKW] 



1050 

117013.72 
6.47 

TREMBL : DMANKY_1 product: "ankyrin"; Drosophila melanogaster ankyrin mRNA, 
cds. 2e-45 

08.19 cellular import [S. cerevisiae, YOR034C] 5e-13 
• 10.05.99 other pheromone response activities [S. cerevisiae, YDR264c] 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YDR264c] 3e-12 

99 unclassified proteins (S. cerevisiae, YILll2w] 2e-ll 

06.13.01 cytoplasmic degradation [S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e-08 
04.05.01.07 chromatin modification [S. cerevisiae, YIR033w] 2e-08 
01.04.04 regulation of phosphate utilization [S. cerevisiae/ YGR233C] 

08.13 vacuolar transport [S. cerevisiae, YML097c] 5e-05 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YHL097cl 

30.03 organization of cytoplasm [S. cerevisiae, YML097c] 5e-05 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

03.22 cell cycle control and mitosis (S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllcJ 3e-04 
BL0O9O1A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1.91.3.1.2 GA binding protein (-GABP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus le-13 
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[ PIRKW] 


potassium channel 5e— 15 


t O T D If U 1 




[PI RKW J 


tumor suppressor le— 09 


IDT DVUl 

[ ri ttRW j 


duplication le— 14 


( PI RKW) 


tandem rope a t le-19 


[ PIRKW) 


heterodimer le— 14 


( P I RKW ) 


potassium transport 5e — 15 


(PI RKW ) 


cell cycle control 16*10 


{ PIRKW) 


r* t- \ rt n /t-kfaAni na^onar{ f 1 r> nrnt"i*1Tl ki.nAS£ 1 P — 1 Q 


1 PIRKW) 


transmembrane protein 5e— 15 


1 PIRKW) 


transport protein 5e _ 15 


| PIRKW) 


ON A binding 2e-ll 


[ PIRKW) 


oncogene le-08 


[ PIRKW) 


ATP le-19 


[ PIRKW) 


protein kinase mniDitor le— uy 


[ PIRKW) 


voltage— gated ion channel 5e — 15 


(PIRKW) 


phosphoprotein 4e— 3fi 


[PIRKW] 


apoptosis le~19 


[PIRKW) 


liver 4e-09 


[ P I RKW ) 


integrin binding 3e-16 


[ P I RKW J 


aiiierenciation iz 


[PIRKW] 


transforming protein le-08 


[PIRKW) 


alternative splicing le-40 


[PIRKW] 


coiled coil le-14 


[PIRKW] 


peripheral membrane protein 2e-38 


[PIRKW] 


transcription factor 4e-16 


[PIRKW] 


transcription regulation 2e-16 


[PIRKW) 


nucleotide binding Se-15 


[PIRKW] 


phosphoric monoester hydrolase le-12 


(PIRKW) 


cytoskeleton 8e-39 


[PIRKW] 


calmodulin binding le-19 


[PIRKW] 


smooth muscle le-12 


[SUPFAM] 


ankyrin le-40 


[SUPFAM] 


death-associated protein kinase le-19 


[SUPFAM] 


ankyrin repeat homology le-40 


[SUPFAM] 


protein kinase homology le-19 


[SUPFAM] 


vaccinia virus 27. 4K Hindlll-C protein homology 3e-07 


[ SUPFAM] 


int-3 transforming protein le-08 


{ SUPFAM] 


unassigned ankyrin repeat proteins 2e-38 


[SUPFAM] 


notch protein 2e-12 


[SUPFAM] 


fowlpox virus BamHI -0RF7 protein 2e-13 


[SUPFAM] 


rel homology 2e-ll 


[SUPFAM] 


EGF homology 2e-12 


[PROSITE] 


ATP_GTP_A 1 


[PFAM] 


Ank repeat 


[KW] 


Irregular 


[KW] 


3D 


tKW] 


L0W_C0MPLEXITY 3.05 % 



SEQ HALYDEDLLKNPFYLALQKCRPDLCSKVftQIHGIVLVPCKGSLSSSIQSTCQFESYILIP 

SEG 

lawcB 

SEQ VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPILFEETFYNEKEESFSILCIAHPLEKR 

SEG 

lawcB 

SEQ ESSEEPLAPSDPFSLKTIEDVREFLGRHSERFDRNIASFHRTFRECERKSLRHHIDSANA 

SEG 

lawcB 

SEQ LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVEI YVHHEI YNLI FKYVGTMEASEDAAFN 

SEG 

lawcB 

SEQ KITRSLQDLQQKDIGVKPEFSFNI PRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 

SEG 

lawcB 

SEQ RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 

SEG xxxxxxxxxx 

lawcB 

SEQ YIRQGSLSAKPPESEGFGDRL FLKQRMSLLSQMTSSPTDCLFKHI A5GNQKEVERLLSQE 

SEG 

lawcB 

SEQ DHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDDRGHTPLHVAAVCGQASLID 

SEG 

lawcB 
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SEQ LLVSKGAMVNATDYHGATPLHLACQKGYQSVTL1XLHYKASAEVQDNNGNTPLHLACTYG 

SEG ' 

lawcB 



SEQ HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 

SEG 

lawcB 

SEQ PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQ 

SEG xxxxxxxxxxxxxxxxxxxxxx . 

lawcB 

SEQ EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

SEG 

lawcB 



SEQ APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

SEG 

lawcB CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTTTTCCH 

SEQ LHLACQQGHFQWKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 

SEG 

lawcB HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHHHHHHHKCCCTTTTEE 

SEQ NKGNTALHEAVI EKHVFVVELLLLHGASVQVLKKRQRTAVDCAEQNSKIMELLQWPSCV 

SEG 

lawcB TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHC 

SEQ AS LDDVA ETDRKEYVTVK I RKKWN SKLYDLPDEPFTRQFY FV HS AGQFKG KTSREIMARD 

SEG 

lawcB 

SEQ RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

SEG 

lawcB 

SEQ RHTVEDAVVSQGPEAAGPLSTPQEVSASRS 

SEG 

lawcB 



Prosite for DKFZphtes3_1817 .2 
PS00017 945->953 ATP_GTP_A PDOC00017 



Pfam for DKFZphtes3_1817 . 2 



HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+AA ++ ++++LL+++GA +N 
Query 4 63 GHTPLHVAAVCGQASLI DLLVSKGAMVN 490 

32.12 (bits) f: 496 t: 523 Target: dkfzphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G TPLHtA++ + ++ LLL + A+ 

dkfzphtes3 496 GATPLHLACQKGYQSVTLLLLHYKASAE 523 

Query f: 529 t: 556 Target: dkf zphtes3_1817 .2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 529 GNTPLHLACTYGHEDCVKALVYYDVESC 556 

42-65 (bits) f: 565 t: 592 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+TPLHIAAR + +++ LLQ+GA+ 

dkfzphtes3 565 GDTPLHIAARWGYQGVIETLLQNGASTE 592 

Query f: 744 t: 771 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvtJ4VrlLLQHGADIN* 

G +PLH+AA +++ +++RLLL+HGA+ 
Query 744 GSSPLHVAALHGRADLIRLLLKHGANAG 771 



649 



WO 01/12659 



PCT/IB00/01496 



36. 3B (bits) f: 777 t: 804 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query •GyTPLHIAARyNNvEMVrlLLQHGADIN* 
PLH+A+++++ ++V+- LL+ +A +N 

dkfzphtes3 777 QAVPLHLACQQGHt QVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM ♦GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GNTPLI YACSGGHHELVALLLQHGASIN 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Que r y *GyTPLHI AARyNNvEMVr 1 LLQHGADI N * 

G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein YFL04 6w . 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



similarity to YFL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map= n 405, 0/. 3 cR from top of Chrll linkage group" 
Insert length: 1395 bp 

Poly A stretch at pos . 1367, no polyadenylation signal found 



1 GGGACCACGG TGGCGCCTGC GCTGGGAGGT GAGCTTGTGA CAGAGCGAAA 
51 ACTACAATTC CCAGCATTCC TGTGGTGCCA GAACTACCTT GCCCGAAAGC 
101 CTGTGCGAGA TTTACCCCGT CTTCCGCCTC CCTCCCACCG GAAAACTCTG 
151 AGGACATGAA TAGTCGCCAG GCTTGGCGGC TCTTTCTCTC CCAAGGCAGA 
201 GGAGATCGTT GGGTTTCAAG GCCCCGCGGG CATTTCTCGC CGGCCCTGCG 
251 GAGAGAGTTC TTCACTACCA CAACCAAGGA GGGATATGAT AGGCGGCCAG 
301 TGGATATAAC TCCTTTAGAA CAAAGGAAAT TAACTTTTGA TACCCATGCA 
351 TTGGTTCAGG ACTTGGAAAC TCATGGATTT GACAAAACAC AAGCAGAAAC 
401 AATTGTATCA GCGTTAACTG CTTTATCAAA TGTCAGCCTG GATACTATCT 
451 ATAAAGAGAT GGTCACTCAA GCTCAACAGG AAATAACAGT ACAACAGCTA 
501 ATGGCTCATT TGGATGCTAT CAGGAAAGAC ATGGTCATCC TAGAGAAAAG 
551 TGAATTTGCA AATCTGAGAG CAGAGAATGA GAAAATGAAA ATTGAATTAG 
601 ACCAAGTTAA GCAACAACTA ATGCATGAAA CCAGTCGAAT CAGAGCAGAT 
651 AATAAACTGG ATATCAACTT AGAAAGGAGC AGAGTAACAG ATATGTTTAC 
701 AGATCAAGAA AAGCAACTTA TGGAAACAAC TACAGAATTT ACAAAAAAGG 
751 ATACTCAAAC CAAAAGTATT ATTTCAGAGA CCAGTAATAA AATTGACGCT 
801 GAAATTGCTT CCTTAAAAAC ACTGATGGAA TCTAACAAAC TTGAGACAAT 
851 TCGTTATCTT GCAGCTTCGG TGTTTACTTG CCTGGCAATA GCATTGGGAT 
901 TTTATAGATT CTGGAAGTAG TATTAATGCT CATCCTGCTG TGGCTGTTGG 
951 CTTCTTAGAA CACCAAACCG GGAGAGATTT ACTTTGAACA TTGTCAGTTG 
1001 CAGCAAAAAT TTACTACACA AGATTATTCG AAGTGTATAC GGACTAAAAG 
1051 AGGAAGTGTT TTAGAATGAG AAGAGATACT GTGTCTTTAT TGTGTGTGTG 
1101 TGAGTGCAGG TGTGTGTCTT TATTATATTG AAAAGCTGTC ACTCAGACCT 
1151 GGTTTGAGAT AGAAGAGCAT TTTGTCCTTT TGATAGTTAA TAGAAATTGA 
1201 ACCAGAGTTT TCTTATGTTT GCTTGAACAG TTGTGTAAAT CATACAGGAT 
1251 TTTGTGGGTA TTGGTTGAAT ATTTGTAAAC CATTCCCTAG CCTACATATT 
1301 TATTACTGAA TTAACTTTCC TGATAACCAT TGCATAATTA CATTTTTCTA 
1351 TAAAATGAAA GATTATTACA ACAAAAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS4 19346 from database EMBL: 
human STS WI-13569. 
Score - 2154, P =- 8.6e-91, identities = 446/459 

Entry HS1292427 from database EMBL: 
human STS SHGC-50338. 
Score - 1737, p = 7.2e-72, identities - 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score - 1578, P - l.Oe-64, identities « 358/397 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 



1 MNSRQAWRLF LSQGRGDRWV SRPRGHFS PA LRREFFTTTT KEGYDRRPVD 
51 ITPLEQRKLT FDTHALVQDL ETHGFDKTQA ETIVSALTAL SNVSLDTIYK 
101 EMVTQAQQEI TVQQLMAHLD AIRKDMVILE KSEFANLRAE NEKMKIELDQ 
151 VKQQLMHETS RIRADNKLDI NLERSRVTDM FTDQEKQLME TTTEFTKKDT 
201 QTKSIISETS NKIDAEIASL KTLMESNKLE TIRYLAASVF TCLAIALGFY 
251 RFWK 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19f 19, frame 3 

SWISSPR0T:YAN8 SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME 
I., N ■ 1, Score « 144, P = 8.4e-09 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces 
cerevisiae), N - 1, Score - 138, P - 5.4e-08 



>SWISSPROT:YAN8_SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length - 211 



HSPs: 



Score = 144 (21.6 bits), Expect - 8.4e-09, P - 8.4e-09 
Identities - 34/121 (28%), Positives - 67/121 (55%) 



Query: 70 L ET HGFDKTQAET I V S ALTALS NVS L DT I YKEMVTQAQQE- 1 TVQQLMAH L DA I RKDMV I 128 

LE G+ AETI + + ++ +L + K + +A+QE ++ QQ L IRK + 
Sbjct: 46 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 IEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 



Pedant information for DKFZphtes3_19f 19, frame 3 



Report for DKFZphtes3_19f 19.3 



{LENGTH] 254 

[MWJ 29505.73 

[pi) 6.99 . . 

[HOMOLJ PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 
2e-10 

1 FUNCAT ) 99 unclassified proteins [S. cerevisiae, YFL046w] 8e-12 

| PROSITE] RGD 1 

(KWJ TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 5.12 % 

IKW] COILED_COIL 11.02 % 



SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVDITPLEQRKLT 

SEG 

PRD ccchhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTHALVQDLETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQE I TVQQLMAHLD 



PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 
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MEM 

SEQ AIRKDMVILEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ FTDQEKQLMETTTEFTKKDTQTKSIISETSNKIDAEIASLKTLMESNKLETIRYLAASVF 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM MMMMMMM 

SEQ TCLAIALGFYRFWK 

SEG 

PRD hhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMM 



Prosite for DKFZphtes3_19f 19. 3 
PS00016 15->18 RGD PDOC00016 



(No Pfam data available for DKFZphtes3_19f 19 . 3) 
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DKFZphtes3_19jl7 



group: testes derived 

DKFZphtes3 19j 17 encodes a novel 436 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rsp5/wwp domain signatures. 

The WW domain (or rspS or wwp domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophin, vertebrate YAP 
protein, mouse NEDD-4 and yeast RSPS, 

The domain is repeated up to 4 times in some proteins. It has been shown to bind proteins with 
particular proline -mot if s, ( AP] -P-P- {AP] -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to C.elegans Y40B1A.2 

there are two long ORFs in this cdna according to EST: 
HS12146/HS75086/AA923755/MMAA17335 remaining intron at Bp 1506-1733 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2762 bp 

Poly A stretch at pos. 2740, no polyadenylation signal found 



1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATAAAA TGTTGCGGAG 
401 ATCTGATAGT CCTGAAAACA AATACAGTGA CAGCACAGGT CACAGTAAGG 
451 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTCTTC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCAATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAGATGGCAG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
1201 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 
1251 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 
1351 ATTCTAATGT GGACATATCT AAAATAAATG AAGTTCTTAC AGCAGCTGTG 
1401 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGAAGTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 
1751 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
1801 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 
1851 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACA^AAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG CCTGCACTAG CAGCACACTT CAGTGAAAAT CTCATAAAAC 

2151 ACGTTCAAGG ATGGCCTGCA GATCATGCAG AGAAGCAGGC ATCAAGATTA 

2201 CGCGAAGAAG CGCATAACAT GGGAACTATT CACATGTCCG AAATTTGTAC 

2251 TGAATTAAAA AATTTAAGAT CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 

2301 CTTTGCGAGA GCAAAGGATA CTATTTTTGA GACAACAAAT TAAGGAACTT 

2351 GAAAAGCTAA AAAATCAGAA TTCCTTCATG GTGTGAAGAT GTGAATAATT 

2401 GCACATGGTT TTGAGAACAG GAACTGTAAA TCTGTTGCCC AATCTTAACA 

2451 TTTTTGAGCT GCATTTAAGT AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 

2501 AATGACAAGG GGACGGGGTC TGTGAGAGTC AATTCAGGGG AAAGATACAA 

2551 GATTGATTTG TAAAACCCTT GAAATGTAGA TTTCTTGTAG ATGTATCCTT 

2601 CACGTTGTAA ATATGTTTTG TAGAGTGAAG CCATGGGAAG CCATGTGTAA 

2651 CAGAGCTTAG ACATCCAAAA CTAATCAATG CTGAGGTGGC TAAATACCTA 

2701 GCCTTTTACA TGTAAACCTG TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 

2751 AAAAAAAAAA AA 



BLAST Results 



Entry AC005876 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1188IS map lOpll .2-10pl2 . 1, 
complete sequence. 

Score - 2130, P - 0.0e+00, identities - 426/426 
12 exons matching Bp 492-2740 



Medline entries 
No Medline entry 



Peptide information for frame 2 



ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 

1 MSLTSDASSP RSWSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPWKQ 
51 GPVSQSATQQ PVTADKQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSHAS 
101 NATVVPQNSS ARSTCSLTPA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 



BLASTP hits 
Ho BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: WW_DOMAIN_l (90-116) 
WW DOMAIN 1 (90-116) 



1 MRDAGDPSPP NKMLRRSDSP ENKYSDSTCH SKAKNVHTHR VRERDGGTSY 

51 SPQENSHNHS ALHSSNSHSS NPSNNPSKTS DAPYDSADDW SEHISSSGKK 

101 YYYNCRTEVS QWEKPKEWLE REQRQKEANK MAVNSFPKDR DYRREVMQAT 

151 ATSGFASGME DKHSSDASSL LPQNILSQTS RHNDRDYRLP RAETHSSSTP 

201 VQHPIKPWH PTATPSTVPS SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 

251 SSVPAQKTER KESTSGDKPV SHSCTTPSTS SASGLNPTSA PPTSASAVPV 

301 SPVPQSPIPP LLQDPNLLRQ LLPALQATLQ LNNSNVDISK INEVLTAAVT 

351 QASLQSIIHK FLTAGPSAFN ITSLISQAAQ LSTQDIPLHE GIQMERDTHR 

401 SKWEVKGSLC QXADKQQECL VWNGSIMVQR LLQPSG 



BLASTP hits 
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No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 3 

TREMBL:CEY40B1A_2 gene: "Y40B1A. 2 Caenorhabditis elegans cosmid 
Y40B1A, N = 1, Score - 144, P - 1.8e-09 

>TREMBL:CEY40B1A_2 gene: "Y40B1A. 2"; Caenorhabditis elegans cosmid Y40B1A 
Length - 120 



Score » 144 (21-6 bits), Expect - 1.8e-09, P - 1.8e-09 
Identities - 30/67 (44%), Positives - 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK — -DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 14 6 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 

Pedant information for DKFZphtes3_19jl7, frame 2 

Report for DKFZphtes3_19}17 . 2 

[LENGTH) 209 

[MW] 22673.85 

[pi] 9.95 

[KWJ All_Alpha 

[KWJ LOW_COMPLEXITY 13.40 % 

SEQ MSLTSDASSPRSYVSPRISTPQTNTVPIKPLISTPPVSSQPKVSTPWKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATWPQNSSARSTCSLTPA 

SEG xxxxxxxxxxxxxxx. . xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

S EQ LAAHFS ENL I KHVQGW PADHAEKQASRLREEAHNMGT I HMS E I CTELKN L RSLVRVCEIQ 

SEG 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRILFLRQQIKELEKLKNQNSFMV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 

(No Prosite data available for DKFZphtes3_19jl7 .2) 
(No Pfam data available for DKFZphtes3_19j 17 . 2) 

Pedant information for DKFZphtes3_19j 17, frame 3 

Report for DKFZphtes3_19j 17 . 3 

[LENGTH] 436 

[MW] 47716.62 

[pi] 8.71 

[HOMOL] TREMBL:CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 6e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 30.10 nuclear organization IS. cerevisiae, YKL012w] 2e-04 

[ FUNCAT J 99 unclassified proteins IS. cerevisiae, YPR152C] 6e-04 

[BLOCKS) BL01159 WW/rsp5/WWP domain proteins 

[PROSITEJ WW_DOMAIN_l 2 

[PFAM] WW/rsp5/WWP domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSKAKNVHTHRVRERDGGTSYSPQENSHNHS 

SEG xxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYNCRTEVSQWEKPKEWLE 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

SEQ REQRQKEANKMAVNSFPKDRDYRREVMQATATSGE-ASGMEDKHSSDASSLLPQNI LSQTS 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhhhhcccccccccccccccccccccccccccccc 

SEQ RHHDRDYRLPRAETHSSSTPVQHPrKPVVHPTATPSTVPSSPETLQSDHQPKKSFDANGA 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLNPTSAPPTSASAVPV 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 

SEQ FLTAGP S A FN ITSLI SQAAQLS TQD I PLHEG I QMERDT H RS KWEVKGS LCQKADKQQEC L 

SEG 

PRD hhcccccceeehhhhhhhhhhhcccccccccccccccccccceeecccchhhhhhhccee 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 



Prosite for DKFZphte33_19jl7 . 3 

PS01159 90->116 WW_DOHAIN 1 PDOC50020 

PS01159 90->H6 WW DOMAIN~l PDOC50020 



Pfam for DKFZphtes3_19j 17 . 3 



HMM_NAME WW/rsp5/WWP domain containing proteins 

HMM * LP 3 GW Ee H WDp S GRpW YYWN MET k TTQW Ep P * 

+ ++W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHISSSGKK-YYYNCRTEVSQWEKP 115 
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DKFZphtes3_lcl 



group: signal transduction 

DKFZphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila rotund transcript and human n-chiraaerin. 

rac small GTPase is associated with type-I phosphatidylinositol A -phosphate 5-)tinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find application in modulating/blocking the response to a cellular 
receptor. 

similarity to GTPase-activating proteins 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3237 bp 

Poly A stretch at pos. 3227, no polyadenylation signal found 

1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 
51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 

101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 

151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TTATACTACT 

201 GGTAGATAGA AGAGCTAAAG AAAGATGGAT ACTATGATGC TGAATGTGCG 

251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 

301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 

351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 

401 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA TGTTAAGCTG AAGCATGCAC 

451 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 

501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 

551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 

601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 

651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 

701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 

751 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 

801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 

851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 

901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 

951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 
1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 
1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 
1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 
1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 
1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 
1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 
1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATTGAGCAAA 
1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 
1401 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 
1451 CAGCAAAGTG GATGATATCC ATGCTATCTG TAGCCTTCTA AAAGACTTTC 
1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 
1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 
1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 
1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 
1701 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 
1751 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 
1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 
1851 ATGATGGTGG AGCAAGAGAA CATTGACCCC CTACATGTCA TTGAAAACTC 
1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 
1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 
2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 
2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 
2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 
2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 
2201 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 
2251 TGTGCATGAG GGTTTTATTA AAACTATATA TATCTCCCCT TCCTTCTCCT 
2301 CAAGTCACAT AATATCAGCA CTTTGTGCTG GTCATTGTTG GGAGCTTTTA 
2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 
2401 TTCTTTTTGG GGAAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 
2451 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 
2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 
2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2601 GCCTCCTTCT AAGAACCAGA AGTTCTCATT CCCCATTATG AACTGAGCTA 

2651 TAATATGGAG CTTTCATAAA AATGGGATGC ATTGAGGACA GAACTAGTGA 

2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 

2751 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC AACTCCTATT TATCTCTGAT 

2801 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 

2851 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 

2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 

2951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 

3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 

3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 

3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 

3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 

3201 AAAAAAGTAA ATAGCTTTTT CAAAATGAAA AAAAAAA 



BLAST Results 



Entry UB2984 from database EMBLEST: 

Homo sapiens DRES 56 mRNA sequence. 

Score - 8775, P = 0.0e+00, identities = 1757/1758 

matches 3' end 



Medline entries 



93074974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP: cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
imaginal disc 

morphogenesis encodes a protein which is similar to human Rac 
GTPase-activating 

(racGAP) proteins. 



Peptide information for frame 3 



ORF from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 



1 MDTMMLNVRN LFEQLVRRVE 

51 LGKYKDLLMK AETERSALDV 

101 QLIREMLMCD TSGSIQLSEE 

151 SILSDISFDK TDESLOWDSS 

201 TRSIGSAVDQ GNESIVAKTT 

251 TLQPWNSDST LNSRQLEPRT 

301 VPCGKRIKFG KLSLKCRDCR 

351 LADFVSQTSP MIPSIVVHCV 

401 LRVKTVPLLS KVDDIKAICS 

451 EDNSIAAMYQ AVGELPQANR 

501 GPTIVAHAVP NPDPVTHLQD 

551 DPLHVIENSH AFSTPQTPD1 

601 TLTKNTPRFG SKSKSATNLG 



ILSEGNEVQF IQLAKDFEDF RKKWQRTDHE 
KLKHARNQVD VEIKRRQRAE ADCEKLERQI 
QKSALAFLNR GQPSSSNAGN KRLSTIDESG 
LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 
VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 
ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 
VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 
NEIEQRGLTE TGLYRISGCD RTVKELKEKF 
LLKDFLRNLK EPLLTFRLNR AFMEAAEITD 
DTLAFLMIHL QRVAQSPHTK MDVANLAKVF 
I KRQPKVVER LLSLPLEYWS QFHMVEQENI 
KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 
RQGNFFASPM LK 



BLASTP hits 



Entry CEK08E3_4 from database TREMBLNEW: 

gene: "K08E3.?**; Caenorhabditis elegans cosmid K08E3 

Score - 452, P - 2.6e-48, identities = 126/377, positives » 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel. 7 - fruit 
fly (Drosophila melanogaster) (fragment) 

Score - 480, P - 9.2e-46, identities - 111/270, positives - 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score - 480, P - 9.2e-46, identities - 111/270, positives - 155/270 
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Entry DM22539_1 from database TREMBL: 

gene: "rotund"; product: "rnracGAP"; Orosophila melanogaster rnracGAP 
(rotund) gene, complete cds. 

Score - 460, P - 9.2e-46, identities - 111/270, positives - 155/270 

Entry S29128 from database PIR: 
N-chimerin - rat 

Score - 336, P - 8.8e-30, identities - B6/253, positives - 128/253 



Alert BLAST P hits for DKFZphtes3_lcl , frame 3 
no Alert BLAST P hits found 

Pedant information for DKFZphtes3_lcl, frame 3 



Report for DKFZphtes3_lcl . 3 



I LENGTH | 632 

IMW] 71026.84 

(pi] 9.08 

[HOMOL] PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
fruit fly (Drosophila melanogaster) 2e-46 

[FUNCATJ 10.99 other signal-transduction activities [S. cerevisiae, YBR260c] 3e-12 

[FUNCATJ 03.22 cell cycle control and mitosis [S. cerevisiae, YER155C) 2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YER155C) 2e-ll 

[FUNCAT] 03.04 budding, cell polarity and filament formation {S. cerevisiae, YER155C] 
2e-ll 

[FUNCATJ 03.10 sporulation and germination [S. cerevisiae, YDL240w] 3e-09 

( FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YOR134w] 4e-09 

[ FUNCAT 1 06.10 assembly of protein complexes IS. cerevisiae, YOR134w] 4e-09 

[FUNCATJ 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YOR127w] 5e-09 

[FUNCAT] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPLllSc] 3e-08 

[ FUNCAT] 10.02.09 regulation of g-protein activity [S. cerevisiae, YPL115c] 3e-08 

[BLOCKS] BL00479B Phorbol esters / diacylglycerol binding domain proteins 

[BLOCKS] BL00479A Phorbol esters / diacylglycerol binding domain proteins 

[SCOP] dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn le-55 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) le-49 

[PIRKW] breakpoint cluster region le-19 

(PIRKWJ transmembrane protein 7e-08 

(PIRKWJ brain 3e-22 

[PIRKW] alternative splicing le-19 

IPIRKWJ P-loop 2e-25 

(SUPFAM) CDC24 homology 3e-22 

[SUPFAM] bcr protein 3e-22 

(SUPFAM] myosin motor domain homology 2e-25 

[SUPFAM] pleckstrin repeat homology 4e-10 

(SUPFAM] LIM metal-binding repeat homology 2e-09 

(SUPFAM] protein kinase C zinc-binding repeat homology 5e-29 

[PROSITE] MYRISTYL 6 

[PROSITEJ AMI DAT ION 1 

( PROSITE] CAMP_PHOSPHO SITE 3 

[ PROSITE] CK2_PH0SPH0_SITE 13 

(PROSITE] TYR_PHOSPHO_SITE 2 

(PROSITE] PKC_PH0SPHO SITE 9 

[PROSITE] ASN_GLYCOSYLATION 1 

[PROSITE] D AG_PE_B I N D I NG_DOMAI N 1 

[PFAM] Phorbol esters - / diacylglycerol binding domain 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 2.22 % 

[KW] COILED_COIL 8.54 % 

SEQ MDTMMLNVRNLFEQLVRRVEI LSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG 

COILS CCCCCCCCCCCC 

Irgp- 



SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLI REMLMCDTSGSIQLSEE 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTIDESGSILSDISFDKTDESLDWDSSLVKTFKLKKR 

SEG 

COILS 
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lrgp- 



SEQ EKPJISTSRQFVDGPPGPVKKTRSIGSAVDQGNESIVAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

lrgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVIKPESC 

SEG 

COILS 

lrgp- 

SEQ VPCGKRIKFGKLSLKCRDCRVVSHPECRDRCPLPCIPTLIGTPVKIGEGKLADFVSOTSP 

SEG 

COILS 

lrgp- 

SEQ MIPSI VVHCVNEIEQRGLTETGLYRISGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS 



SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDNSIAAMYQAVGELPQANRDTLAFLMIHL 

SEG 

COILS 

lrgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRVAQSPHTKMDVANLAKVFGPTIVAHAVPNPDPVTMLQDIKRQPKVVERLLSLPLEYWS 

SEG 

COILS 

lrgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENIDPLHVIENSNAFSTPQTPDIKVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

SEG xxxxxxxxxxx 



lrgp- 




i-GGCCCCHHHHH 



COILS 
lrgp- 



SEQ 
SEG 
COILS 
lrgp- 



TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 



XXX 



Prosite for DKFZphtes3_lcl . 3 



PS00001 
PS00004 
PS00004 
PS00004 
PS000OS 
PS0O0O5 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS0O0O5 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 



144->148 
206->210 
234->238 
270->274 
323->327 
387->391 
392->396 
410->414 
449->453 
489->493 
579->583 



174->177 
186->189 
245->248 
313->316 
392->395 
435->438 
595->598 
606->609 



212->216 
141->145 
182->186 
246->250 



376- >385 
131->137 
150->156 
276->282 

377- >383 
388->394 
623->629 
303->3O7 



63->66 



47->51 
66->70 



46->55 



AS N_G LYCOS Y LAT I ON 

CAMP_PHOSPHO SITE 

CAMP_PHOSPH02SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SIT£ 

CK2 PHOSPHO_SITE 

CK22PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2 PHOSPHO_SITE 

TYR~PHOSPHO_SITE 

TYR PHOSPHO_SITE 

MYRISTYL 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT ION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDCC00006 
PDOC00006 
PDOC00006 
PDCC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
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PS00479 287->336 DAG_ PE_B I N D I NG_DOMAI N PDOC00379 



Pfam for DKFZphtes3_lcl . 3 



HMM_NAME Phorbol esters / diacylglycerol binding domain 

HMM *HrFmrHTFrqPTWCDHCgeFIWGMgKQGYQCQnCgMNCHKRCHelVPmm 

H+F+ +T + P +C CG +1 +GK ++C +C+++ H +C+ + P 
Query 287 HDFVSKTVIKPESCVPCGKRI-KFGKLSLKCRDCRVVSHPECRDRCPLP 

HMM C* 
C 

Query 335 C 335 
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DKFZphtes3_lgl3 



group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 100*7 amino acid protein with similarity to human 256 kD 
golgin. 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cplSl shows 
haploid-specific transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 



similarity to 256 kD golgi, strong similarity to rat "cplSl" 
21 exons encoded on AC004 682 

EST from a testis library, two mouse ESTs of a testis cDNA library, 
rat cplSl shows haploid-specific transcription! 
testis or haploid-specific transcription 

Sequenced by DKFZ 

Locus: map-"16q22.2" 

Insert length: 3405 bp 

Poly A stretch at pos. 3394, polyadenylation signal at pos . 3373 



1 GGGATAGGGG ATGTGGTTTG TTACAAAGGA TGAGTATTTT GATAGCTTCT 

51 CATTCCTTGA ACTATTCTGC AGGTTTATAA CAAAGCTCAG AAAATACTAA 

101 AGCTTAAAGG AGAATTGAGA GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 

151 GAGAGAGACA GAGAAGTGAG CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 

201 ACTTGACATC AAGAATCTGC ACGATGTCTG CAAGAGACAG AGGAAGACCT 

251 TGCAGGACAA TCAGCTCTGC ATGGAGGAGG CAATGAACAG CAGCCACGAC 

301 AAGAAGCAAG CACAGGCATT AGCATTCGAG GAGTCAGAGG TGGAATTTGG 

351 GTCCAGTAAA CAGTGTCATC TGAGACAACT CCAGCAACTG AAGAAAAAAT 

401 TGCTGGTCCT TCAACAAGAA CTGGAGTTTC ACACAGAGGA GTTGCAGACT 

451 TCTTACTATT CTCTCCGCCA GTATCAGTCC ATCCTAGAGA AGCAGACTTC 

501 CGACCTGGTT CTTCTGCACC ATCACTGCAA ACTGAAAGAA GATGAGGTGA 

551 TTCTCTATGA GGAGGAAATG GGAAATCACA ACGAGAACAC AGGGGAGAAG 

601 CTCCATTTGG CGCAGGAGCA ACTCGCCTTG GCCGGGGACA AGATCGCCTC 

651 TCTAGAGAGG AGCTTAAACC TCTACAGGGA TAAATACCAG TCTTCCCTGA 

701 GCAACATCGA GTTACTAGAA TGCCAAGTGA AGATGTTGCA GGGGGAACTC 

751 GGCGGGATCA TGGGTCAGGA GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 

801 ACGGATATAC ACTTCTCCTT GCATGATTCA AGAGCATCAG GAGACTCAGA 

851 AACGACTGTC TGAAGTCTGG CAAAAGGTCT CTCAACAGGA TGATCTCATT 

901 CAAGAACTTC GAAATAAGCT GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 

951 TGAAAAGGCT TTGATAAAAC TACAAGCCGA TTTTGCTTCC TGTACAGCCA 

1001 CCCACAGATA CCCTCCTAGC TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 

1051 ATACTGAAGC ACTTGCAGGA GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 

1101 GGAGTACCAG AACCTGGTGA AGGATCTGCG CGTGGAACTA GAGGCCGTGT 

1151 CGGAACAGAA GAGAAACATC ATGAAGGACA TGATGAAGCT GGAGCTGGAC 

1201 CTGCACGGAC TGCGGGAGGA GACATCTGCC CACATTGAGA GGAAGGATAA 

1251 GGACATCACC ATCCTGCAGT GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 

1301 CCGAGACCCA AAAGCTCACT TTGAAGAAAG ACAAGTTCCT CCAAGAGAAA 

1351 GATGAGATGC TGCAAGAGCT GGAGAAGAAA CTGACACAGG TTCAGAACAG 

1401 CCTCCTGAAA AAGGAGAAGG AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 

1451 AACTTGAAAT GACAGTCAAG GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 

1501 GAGTGCAAGG CCCTGCAGGC TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 

1551 AGAGGCCAAG CAGCAGGAGA GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 

1601 AAGAAGAGGC TGCACTGGCA GGCTGTCACC TGGAGGACAC CCAGAGGAAA 

1651 CTGCAGAAGG GTCTCCTCCT GGACAAGCAG AAGGCAGACA CCATCCAGGA 

1701 ACTACAGAGA GAACTTCAGA TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 

1751 AGGAACAAAC CTCCAACAGA AAACGGGTGG AGGAGCTGTC ATTAGAACTC 

1801 TCTGAAGCCC TGAGGAAGCT TGAAAATTCA GACAAGGAAA AGAGGCAGCT 

1851 TCAGAAGACA GTGGCTGAGC AGGATATGAA AATGAATGAC ATGCTTGATC 

1901 GTATCAAGCA CCAGCACAGG GAGCAAGGCT CCATCAAATG CAAGTTAGAA 

1951 GAAGATCTTC AGGAGGCCAC AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 

2001 GAAGAAGAGC AAAGAGCATG AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 

2051 TGCGGCAGGA ATTTAAAAAG AAAGACAAGA CGTTGAAAGA GAATTCCAGA 

2101 AAGTTGGAGG AAGAAAATGA GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 

2151 TACACAACTG GAATCCTCTC TCAACAAATA CAACACCAGC CAGCAAGTCA 

2201 TCCAAGACTT GAATAAAGAG ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 

2251 CTGCAGGCCC AGCTGGACAA AGCTCTGCAG AAGGAGAAGC ACTATCTCCA 

2301 GACTACCATC ACCAAAGAAG CCTATGATGC ATTATCCCGG AAGTCAGCCG 

2351 CCTGCCAGGA TGACCTGACA CAAGCCCTCG AGAAGCTCAA TCACGTGACC 

2401 TCAGAGACAA AGAGCCTGCA GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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2451 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 
2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 
2551 GAGGTGCACG CCTTTGACAA GAAGCTAGAG GAGATGAGCT GCCAGGTGCT 
2601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 
2651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 
2701 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 
2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 
2801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 
2S51 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 
2901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 
2951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 
3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 
3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 
3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCCCGGG TTCCTCATAC 
3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 
3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 
3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 
3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 
3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 
3401 AAAAA 



BLAST Results 



Entry AC004682 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAC clone CIT987SK-A-259H10, complete 
sequence . 

Score » 1291, P = 0.0e+00, identities - 265/272 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known' protein 

Prosite motifs: LEUCINE_ZIPPER (83-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE_ZIPPER (97-119) 

LEUCINE ZIPPER (104-126) 

LEUCINE~ZIPPER (403-425) 

LEUCINE_ZIPPER (410-432) 

LEUCINE_ZIPPER (918-940) 



1 MKDEAGERDR EVSSLNSKLL SLQLOIKNLH DVCKRQRKTL QDNQLCMEEA 

51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLQQLKKKL LVLQQELEFH 

101 TEELQTSYYS LRQYQSILEK QTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 

151 ENTGEKLHLA QEQLALAGDK IASLERSLNL YRDKYQSSLS NIELLECQVK 

201 MLQGELGGIM GQEPENKCDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 

251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 

301 CEDIKKILKH LQEQKDSQCL HVEEYQN LVK DLRVELEAVS EQKRN IMKDM 

351 MKLELDLHGL REETSAHIER KDKDITILQC RLQELQLEFT ETQKLTLKKD 

401 KFLQEKDEML QELEKKLTQV QNSLLKKEKE LEKQQCMATE LEMTVKEAKQ 

451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 

501 EDTQRKLQKG LLLDKQKADT IQELQRELQM LQKESSMAEK EQTSNRKRVE 

551 ELSLELSEAL RKLEHSDKEK RQLQKTVAEQ DMKMNDMLDR IKHQHREQGS 

601 IKCKLEEDLQ EATKLLEDKR EQLKKSKEHE KLMEGELEAL RQEFKKKDKT 

651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSQQVI QDLNKEIALQ 

701 KESLHSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 

751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI IAYEERMKKL NTELRKLRGF 

B01 HQESELEVHA FDKKLEEMSC QVLQWOKQHQ NDLKMLAAKE EQLREFQEEM 

851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 

901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQEHKKLK KEIEEKKMKA 

951 ENTRLCTKAL GPSRTESTQR EKVCGTLGWK GLPQDMGQRM DLTKYIGMPH 
1001 CPGSSYC 

BLASTP hits 
Entry HS417401_1 from database TREMBL: 

product: "trans-Golgi p230"; Human trans-Golgi p230 mRNA, complete 
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cas . 

Score - 411, P - 3.9e-34, identities - 212/862, positives - 420/862 
Entry SCINTANA_1 from database TREMBL: 

Saccharomyces cerevisiae integrin' analogue gene, complete cds . 
Score - 404, P - 6.2e-34, identities «■ 199/897, positives - 423/897 

Entry HS6802 2 from database TREMBL: 

gene: "MYH9"7 product: "dJ6802 . 2"; Homo sapiens DNA sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS. 

Score - 404, P - 1.9e-33, identities - 231/1028, positives - 469/1028 
Entry AF092090_1 from database TREMBL: 

product: M cpl51"; Rattus norvegicus cpl51 raRNA, partial cds. 

Score = 2523, P - 3.0e-262, identities - 506/733, positives - 611/733 



Alert BLAST P hits for DKFZphtea3_lgl3, frame 1 

TREMBL : HSGOLGIN 1 product: "256 kD golgin"; H . sapiens mRNA for golgin, 
N » 1, Score » 411, P = 4.4e-34 

TREMBL: HS417401_1 product: "trans-Golgi p230"; Human trans-Golgi p230 

mRNA, complete cds., N - 1, Score - 411, P - 4.5e-34 

TREMBL: SCINTANA 1 Saccharomyces cerevisiae integrin analogue gene, 
complete cds., ii - 1, Score - 404, P - 7.1e-34 



>TREMBL: HSGOLGIN 1 product: "256 kD golgin"; H . sapiens mRNA for golgin 
Length - 2,185 

HSPs: 

Score - 411 (61.7 bits), Expect - 4.4e-34, P = 4.4e-34 
Identities - 212/816 (25*}, Positives - 420/816 (51%) 

Query: 145 EMGNHNEN-TGEKLHLAQEQLALAGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL+ YR KY ++ ++L+ + K LQ 

Sbjct: 119 DMDSEAEDLVGNSDSLNKEQLI— QRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQ 175 

Query: 204 GELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQKRLSEVWQ- KVSQQDOLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 176 G ILSQSQ DKSLRRIAELREELQMDOQAKKHLQEEFDASLEEKDQYISVLQTQ 227 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSS EEC-ED — IKKILKHLQE 313 

++ + + ++ K L +L+ A PS E ED K L+ LQ+ 

Sbjct: 228 VS LLKQRLRHG PMNVDVLK PL PQLE PQ- AEV FT KEENPES DG E P WEDGTS VKT LET LQQ 266 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ q C ++ ++ L E EA+ EQ ++++ K+ + DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIK-DLH-MAEKTKL 344 

Query: 367 HIERKDKDITI LQC RLQE LQL E FT ETQK LT LK KDK FLQEK DEMLQELEKK LTQV - -QN S L 424 

+ +D I Q Q+ + ET++ + + L+ K+E + +L ++ Q+ Q 
Sbjct: 345 ITQLRDAKNLIEQLE-QDKGMVIAETKR QMHETLEMKEEEIAQLRSRIKQMTTQGEE 400 

Query: 425 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 484 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 
Sbjct: 401 LREQKE-KSERAAFEELEKALSTAQ— KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 456 

Query: 485 QA-AQCKEEAA-LAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL ++LQ ++E E+ + 

Sbjct: 4 57 QELS RVKQE WDVMKKS S E EQI AKLQK - - LH EKELARK EQELTKKLQT RERE - - FQEQMK 512 

Query: 54 3 TSNRKRVEELSLELSEALRKLENSDKEKRQLQKT—VAEQDMKMNDMLDRIKHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + £ + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILE 571 

Query: 601 1KCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG— ELEALR-QEFKKKDKTL 651 

++ LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQEHKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE IALQKESLMS 706 

+ ++ + E E LR + C + E+ L +K Q I+++N++ + +++ L S 
Sbjct: 632 QVLKQQYQTEMEKLREK— CEQEKETLLKDKEI1FQAHIEEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT-- , iITKEAYDALSRKSAACQODLTQALEKLNHVTSETKSLQQ 764 
L ++L + L K +H L+ ++ K* D + ++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDE-- QKNHHQQQVDSIIKEHEV 745 

Query: "765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

S+ +T+ KA L++ + I E +K+ + L++ + + E ++ + +L++ S ++ 
SbjCt: 746 SIQRTE--KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ ♦ E L + + KD C 

SbjCt: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTE 855 

Query: 879 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLS-GEKDHLHSVMVHLGCENK 937 

L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + +++EN 

SbjCt: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT-- QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 
Sbjct: 913 ILQMREGQKKEIEILTQKLSAK 934 

Score = 338 (50.7 bits), Expect - 3.1e-26, P - 3.1e-26 
Identities - 216/953 (22%), Positives - 468/953 (49%) 



Query: 


2 


KDEAGERDRE— VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EEAM 


51 




K+E E D E V S K L +LQ +K + + KR + +T+Q + + C +EA+ 




Sbjct: 


260 


KEENPESDGEPWEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 


319 


Query: 


52 


NSSHDKKQAQALAFEESEVEFGSSKQCHLRQ LQQLK— KKLLVLQQELEFHTEELQ 


105 




D ++ + ++ + + LR ++QL+ K ++ + + + + H E L+ 




Sbjct: 


320 


QEQLDERLQELEKIKDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMH-ETLE 


378 


Query: 


106 


TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 


164 




+ Q +S +++ T+ L K K + EE +T +K A+ +L 




Sbjct: 


379 


MKEEEIAQLRSRIKQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 


434 


Query: 


165 


ALAGDKIA5LERS LNLYRDKYQSSLSNI--ELLECQVKMLQGELGGIMGQEPENKGDHSK 


222 




A ++I + + E++ R Q LS + E+++ K + + + + Q+ K K 




Sbjct: 


435 


AEMDEQIKTIEKTSEEERISLQQELSRVKQEWDVHKKSSEEQI AKL--QKLHEKELARK 


492 


Query: 


223 


VRIYTSPCMIQEHQETQKRLSEVWQKVSQQDOLIQELRHKLACSNALVLEREKALIKLQA 


282 




+ T +£ +E Q+++ +K SQ + L ++ + +L LE ++LQ 




Sbjct: 


493 


EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL--KISQEKEQQESLALEE LELQK 


544 


Query: 


283 


DFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 


341 




A T + +E E + + L+ + ++E +N KDL V LEA + + 




Sbjct: 


545 


K-AILTESENKLRDLQQEAETYRTRILELESSLEKS-— LQENKNQSKDLAVHLEAEKNK 


600 


Query: 


342 


QKRNIMKDMMKLELDLHGLREETSAH1ERKDKDITI-LQCRLQELQLEFTETQKLTLKKD 


400 




+1 +K++LL++A K++ Q +++L+ E E +K TL KD 




Sbjct: 


601 


HNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLI.KO 


659 


Query: 


401 


K FLQEKDEM-LQELEKKLTQVQHSLLKKEKELEKQQCMATELEMTVKEAKQDKS 


453 




K ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 




Sbjct: 


660 


KEIIFQAHIEEMNEKTLEKLOVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 


717 


Query: 


454 


K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC - KEEAALAGCHLEDTQRKLQKGL 


511 




K E E K + + + ++ ++ ++ Q+ + K++ L++ + L++ 




Sbjct: 


718 


KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQ 


776 


Query: 


512 


L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 


570 




++ +AD 1+ + ELQ + + + Q++ ++ + +L++ +KL + + E+ 




Sbjct: 


777 


AHVENLEAD-IKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 


835 


Query: 


571 


RQLQKTVAEQDMKMN DM LD--RIKHQHREQGSIK--CKLEEDLQEATKLLEDKREQL 


623 




L K VAE + + D+ LD +1+ Q Q K ++E+ ++ T++ EKE 




Sbjct: 


836 


I L LT KQVAEVEAQKKDVCT EL DAHK I QVQDLMQQLEKQNS EMEQKVKS LTQV YES KLEDG 


895 


Query: 


624 


KKSKEHEK— LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 


681 




K+E K L+E E L+ +K K ++ ++KL + +++ + T+ ++ 




Sbjct: 


896 


NKEQEQTKQI LVEKENMI LQMREGQK - KE I EI LTQKLS AKEDS I H I LNEEYETKFKNQEK 


954 


Query: 


682 


KYNTSQQVIQDLNKEIALQKESLMS LQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 


741 




K +Q +++ + + K+ L+ +A+L K L E L+ + + + ++A + A 




Sbjct: 


955 


KMEKVKQKAKEMQETL— KKKLLDQEAKLKKEL-7ENTALELSQKEKQFNAKMLEMAQA 


1009 


Query: 


742 


QD-DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGF 


800 




++ A+ +L T++ + ++ SLT+ + +L + I +E KKLN + +L+ 




Sbjct: 


1010 


NSAGISDAVSRLE--TNQKEQIE-SLTEVHRR — ELNDVISIWE KKLKQQAEELQEI 


1061 


Query: 


801 


HQESELEVHAFDKKLEEMSCQVLQW--QKQHQKDLKMLAAKEEQLREFQEEMAALKENLL 


858 




H E+++ + +++ E+ ++L + +K+ N ++ KEE ++ + + L+E L 




Sbjct: 


1062 


H EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 


1116 
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Query: 859 EODKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-- WAKQQKVANEKLGNQLREQVNYI- 915 

+ L Q K L + + +L++ + ++Q V + L + + +V+ + 

SbjCt: 1117 QKSAHVNSLAO-DETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 1175 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 
SbjCt: 1176 SKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE 1212 

Score - 337 (50.6 bits). Expect - 4.0e-26, P - 4.0e-26 
Identities - 215/951 (22%), Positives - 433/951 (45%) 

Query: 10 REVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQALAFEESE 69 

+E + +++L L+++ KQKL+ EA + H+K+ + E+ + 

SbjCt: 560 QEAETYRTRII.ELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT--VMVEKHK 613 

Query: 70 VEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

SbjCt: 614 TELESLK — H-QQDALWTEKLQVLKQQYQTEMEKLREK CEQEKETLLKD-KEI I FQA 666 

Query: 130 HHCKLKE DEVILYEEEMGNHNENTGEKL— HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 
SbjCt: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSLSNIELLECQVKMLQGE — LGGIMGQEPENKGDHSKVRIYTSPCMIQEHQE 237 

+ +Q++1+E+V++EL +Q +K+ +++ 

SbjCt: 727 EQKNHHQQQVDSI-IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
SbjCt: 785 DIKRS EGELQQASAKLDVFQSYQS ATHEQTKAYEEQLAQLQQKLLDLE-TERIL 837 

Query: 298 SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE + +■ +K + ++ E 
SbjCt: 838 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITILQCRL-QELQLEFTETQKLTLKKDKF— LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+ ++E TQKL+ K+D L E+ E + 

SbjCt: 892 LEDGNKEQEQTKQILVEKEKM1LQMREGQKKEIEIL-TQKLSAKEDSIHILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLLK KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 466 

EKK+ +V+ + K+K L+++ + ELE T E Q K K+ K L+ Q 

SbjCt: 951 NQEKKMEKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQ-KEKQFNAKMLEM-AQ 1008 

Query: 467 KLKN S LE E AKQQERLAAQQAAQC KEEAALAGC H LE DTQRKLQKGL LL DKQKADT I QELQR 526 

+ +A RL Q Q + + h D +K L Q+A+ +QE+ 
SbjCt: 1009 ANSAGISDAVS--RXETNQKEQIESLTEVHRRELNDVISIWEKKL NQQAEELQEIH- 1062 

Query: 527 ELQMLQKESSMAEKEQT SNRKRV— EELSLELSEALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

SbjCt: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITHLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L++++ + L+E L S LE+++++ K + 

SbjCt: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E ++L+ +K +K+L++ S + ++ +E L +L C + E+ h T++ + + 
SbjCt: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q + K KE ++T E +A R+ Q+ L QA 
Sbjct: 1242 KTNAILSR-ISHCQHRTTKV--KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLN TELRK--LRGFHQESE 805 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

SbjCt: 1298 QLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTL 1357 

Query: 806 LEVHAFDKKLE--EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 863 

++ +KK+E +S Q+ Q QN + L+ KE + +++ K LL D + 

SbjCt: 1358 «KEELKEKKVEISSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 864 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE QVNY1AKLSG 920 

++ K+ D +W K+ + + N ++E Q+ +K + 

Sbjct: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EKOH - LHS VMVHLQQENKK- - - LKKEI EEKKMKAE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQINLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score - 332 (49.8 bits), Expect » 1.4e-2S, P - 1.4e-25 
Identities - 209/953 (21%), Positives - 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + + ++ +E++ + s + 

Sbjct: 470 MKKSSEEQI AKLQKLHEKELARK-EQELTKKLQTRERE FQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ L + + KL LQQE E + + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAILTESEN— KLRDLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE— VILYEE EMGNHNENT — GEKLHLAQEQLALA 167 

+ Q+ DL + K K ++ ++ E+ E H ++ EKL + + +Q 

Sbjct: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDKIASL— ERSLNLYRDK— YQSSLS-- NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

♦ K+ + L +DK +Q+ + N + LE ++ + 0 EL + + E 

Sbjct: 642 MEKLREKCEQEKETLLKDKE1IFQAHIEEKNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ s ++ + + +T K E+ K+ +Q + Q+ + + + + ++R + +K 
Sbjct: 701 HKLEEELS— VLKD— QTDKMKQELEAKHDEQKNHHQQQVDSIIKEHEVSIQRTEKALKD 756 

Query: 281 QADFASCTATHR--YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+++ +K + + ++ +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKA 816 

Query: 339 VSEQKRNIHKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKLTI.K 398 

EQ + + ++ LE + L ++ A +E + KD+ C EL + Q L + 

Sbjct:- 817 YEEQLAQLQQKLLDLETERILLTKQV-AEVEAQKKDV CT — ELDAHKIQVQDLHQQ 869 

Query: 399 KDKFLQEKDEMLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 457 

+K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 
Sbjct: 870 LEK QKSEMEQKV-KSLTQVYESKLEDGNKEQEQTKQILVEKENM1LQMREGQKKEIE 925 

Query: 458 C--KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK — LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

SbjCt: 926 I LTQKLSAKEDS I HI LNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K MA+ V L E + L ++ +R+ 

Sbjct: 982 KKELENTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL--TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKS KE 628 

L ++ + K+N + ++ H Q K + +L++ L ++E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 688 

+ L L+++ K+K + NS L ++ L+A L+ L SL + Q+ 

Sbjct: 1100 EGVKQDTTLNELQEQLKQKSAHV--NS--LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 V I QDL N K E I ALQKESLMS LQAQL - - - DKALQ- - KEKH Y LQTT ITKEA---YDALS RKSAA 7 40 

+ +L K + L ++L D+ Q K H ++ + LS + A 

SbjCt: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNH VTS ETKSLQQSLTQTQE K KAQLEEE 1 1 AY EERMKK L 790 

Q D+ T+AL E +N +S+T ++ ++ Q + +++E ++ + +L 

SbjCt: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+ LR+L + +LEE Q+ K + D++ L ++E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNI SFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

SbjCt: 1328 G--NQQQAASEKESC-ITQ— LKKELSE N I N AVT LHK EE L KEKKVE I S S LS KQLT D 1378 

Query: 911 QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ LS ++ + S+ +E +L + +++ K + 

SbjCt: 1379 LNVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVD 1422 

Score - 329 (49.4 bits). Expect - 2.9e-25, P - 2.9e-25 
Identities - 226/941 (241), Positives - 444/941 (47%) 

Query: 61 qalafeeseve--fgsskqchlrqlqqlkkkllvlqqelefhteelqtsyyslrqyqsil 118 

Q L E+ +++ S+ LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENT GEKL HLAQEQLALA 167 

+ QSL + + D+ ++E+ EH GE+ ++L 

SbjCt: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDK I AS LE RS LN L Y RDKYQS S LS N I ELLECQVKMLQGE LGG I MGQE PENKGDHS KVRI YT 227 
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Sbjct: 


285 


QQRVKRQENLLKRCKETIQSHK EQCTLLT S E KEALQEQL DERL -QELEKIKD LHMAE 


340 


Query: 


228 


SPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASC 


287 




+1 ♦++++++ Q +1 E + ++ L ++ E+ + +L++ 




Sbjct: 


341 


KTKLITQLRDAKNLIEQLEQDKGM VIAETKRQM--HETLEMKEEE-IAQLRSRIKQM 


394 


Query: 


288 


TATH RYPPSSSEEC--EDIKKILKHLQEQKDSQCLHVEEYQN LVKDL RVE 


335 






T R SE E+++K L Q+ ++++ E +K + R+ 




Sbjct: 


395 


TTQGEELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERIS 


454 


Query: 


336 


LEA-VSEQKRNIMKDMMKL--ELDLHGLREETSAHIERKDKDITXLQCRLQELQLEFTET 


392 




L+ +S K+ ++ D+MK E + L++ + RK++++T +LQ + EF E 




Sbjct: 


455 


LQQELS RVKQE W- DVM KKS S E EQI AKLQKLHE KELARK EQELTK KLQTRERE FQEQ 


510 


Query: 


393 


QKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 


452 




K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 




Sbjct: 


511 


MKVALEKSQ--SEVLKISQEKEQ QESLALEELELQKKAIL-TESENKLRDLQQE- 


561 


Query: 


453 


SKEAECKALQAEVQKLKNSLEEAKQQER LAAQQAAQCKEEAALAGCHLEDTQR-K 


506 




++ + L+ E L+ SL+E K Q + L A++ KE + H + + K 




Sbjct: 


562 


AETYRTRILELE-SSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLK 


620 


Query: 


507 


LQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRK-LEN 


565 




Q+ L ++ Q+ Q E++ L +E EKE K + + E K LE 




Sbjct: 


621 


HQQDALWTEKLQVLKQQYQTEMEKL-REKCEQEKETLLKDKEI I- FQAH I EEMNEKTLEK 


678 


Query: 


566 


SDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGSI-KCKLEEDLQEA-TKLLEDKR— E 


621 




D ++ +L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 




Sbjct: 


679 


LDVKQTELESLSSE LSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQ 


733 


Query: 


622 


QLKKS— KEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEEN— ENLRAELQCCSTQL 


676 




Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ +L 




Sbjct: 


734 


QQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGEL 


793 


Query: 


677 


ESSLNKYNTSQQVIQDLNKEI AJjQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSR 


736 




+ + K + Q +++ +E L LQ +L L+ E+ L TK+ + ++ 




Sbjct: 


794 


QQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKL-LDLETERILL TKQVAEVEAQ 


848 


Query: 


■ 737 


KSAACQD DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ--LEEE1IAYEE 


785 




K C + DL Q LEK N SE + +SLTQ E K + +E+ + 




Sbjct: 


849 


KKDVCT EL DAH K I QVQDLMQQLEKQN - - - SEMEQKVKS LTQV YES KLEDGN KEQEQTKQI 


905 


Query: 


786 


RMKKLNTELRKLRGFHQESELEVHAFOKKLEEMSCQVL— QWQKQHQNDLKMLAAKEEQL 


843 




++K N L+ G Q+ E+E+ +E S +L +++ + +N K + +++ 




Sbjct: 


906 


LVEKENMILQMREG— QKKEIEILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKA 


963 


Query: 


844 


REFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 


899 




+E QE LK+ LL+ ++ L++ L ♦ Q ++A+ 




Sbjct: 


964 


KEMQE— TLKKKLLDQEAK— LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGISD 


1016 


Query: 


900 


ANEKLGNQLREQVNYIAKLSG-EKDHLHSVMVH-LQQENKKLKK— EIEEKKMKAENTRL 


955 




A +L + EQ+ + ++ E + + S+ L Q+ ++L++ EI+ ++ + E L 




Sbjct: 


1017 


AVSRLETHQKEQIESLTEVHRRELNDVIS1WEKKLNQQAEELQEIHEIQLQEKEQEVAEL 


1076 


Query: 


956 


CTKALGPSRTESTQREKVCGTLGWKGLPQD 985 








K L E + K L +G+ QD 




Sbjct: 


1077 


KQKIL-LFGCEKEEMNKEITWLKEEGVKQD 1105 




Score 


- 326 


(49.9 bits), Expect - 6.0e-25, P - 6.0e-25 





Identities - 220/907 (24%), Positives - 444/907 (48%) 



Query: 


67 


ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILE---KQTS 


123 






E+E G+S + QL Q +++ EL T+Y L++ + L+ Q+ 




Sbjct: 


123 


EAEDLVGHSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 


182 


Query: 


124 


DLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKI ASLERSLNLYRD 


183 




D L +L+E+ + +++ H + E+ + E+ 1+ L+ ++L + 




Sbjct: 


193 


DKSL-RRIAELREE--LQMDQQAKKHLQ EE PDAS LEE KDQYI SVLQTQVSLLKQ 


233 


Query: 


184 


KYQSSLSNIELL ECQVKMLQG E LGG I MGQE- PEN KG DHSKVR-IYTSPCMIQEHQ 


236 






+ ++ H+++L+ + L+ + +E PE+ G D + V+ + T ++ + 




Sbjct: 


234 


RLRNGPMNVDVLK-PLPQLEPQAEVFTKEENPESDGEPWEDGTSVKTLETLQQRVKRQE 


292 


Query: 


237 


ETQKRLSEVHQKVSQQODLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPS 


296 




KR E Q +Q L+ K A L ER + L K++ D T 




Sbjct: 


293 


NLLKRCKETIQSHKEQCTLLTS--EKEALQEQLD-ERLQELEKIK-DLHMAEKTKLIT-- 


346 


Query: 


297 


SSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELD 


356 






+ D K +++ L++ K + E + + + L++E++QR++KM + 




Sbjct: 


347 


QLRDAKNLIEQLEQDKGM--VIAETKRQMHETLEMKEEEIA-QLRSRIKQMTTQGEE 


400 
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Query: 357 LHGLREETS-AHIERKDKDITILQCRLQE LQLEFTETQKLTLKKDKFLQEKDEMLQ 411 

L +E+ + A E +K ++ Q + +E L+ E E K T++K +E+ + Q 
SbjCt: 401 LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TIEKTSE-EERISLQQ 457 

Query: 412 ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 470 

EL + +V + + K E+++ K Q + E E+ KE Q+ +K+ + + + + Q +K 
Sbjct: 458 ELSRVKQEVVDVMKKSSEEQI AKLQKLH-EKELARKE--QELTKKLQTREREFQEQ-MKV 513 

Query: 471 SLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQ-KGLLLD-KQKADTIQELQREL 528 

+LE++ QEL Q++EAL L+ ++LD +Q+A+T + EL 

Sbjct: 514 ALEKS-QSEYLKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILEL 572 

Query: 529 QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNDM 587 

+ ES+E+S VLE++ +++ +K K +L+ +QD + 
Sbjct: 573 ES-SLEKSLQENKNQSKDLAVH-LEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 630 

Query: 588 LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE--QLKKSKEKEKLMEGELEALRQEF 644 

L +K Q++ E ++ K E QE LL+DK Q + +EK +E +L+ + E 
SbjCt: 631 LQVLKQQYQTEMEKLREKCE— QEKETLLKDKEIIFQAHIEEWNEKTLE-KLDVKQTEL 686 

Query: 64 5 KKKDKTLKE— NSR-KLEEENENLRAELQCCSTQLESSLNKY-NTSQQVIQDLNKE-- IA 698 

+ L E +R KLEEE L+ + +LE+ +++ N QQ + + KE ++ 

SbjCt: 687 ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVS 746 

Query: 699 LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 749 

+Q+ E + Q QL+ L +++KH + E +A ++S A+ + D+ Q+ 

SbjCt: 747 IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 606 

Query: 750 EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVH 809 

+ H +TK+ ++ L Q Q+K LE E I ++++++++ +++V 
SbjCt: 807 QSATH— EQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKOVCTELDAHKIQVQ 864 

Query: 810 A FDXKLE EMSCQVLQWQKQHQN - - DLKMLAAKEEQLRE FQEEMAALKEN L L EDDKE 863 

++LE+ + ++ Q K + K+ +EQ E +++ KEN++ E K+ 

SbjCt: 865 DLMQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ— EQTKQILVEKEKMILQMREGQKK 922 

Query: 864 PC -C LPQ- WS VPK DTCRL YRGN DQ I MTN LE - QWAKQQKVANE — KLGNQL REQV - N Y I AK 917 

L Q S +D+ + N++ T + Q K +KV + ++ L++++ + AK 
SbjCt: 923 EIEILTQKLSAKEDSIHIL— NEEYETKFKNQEKRMEKVKQKAKEMQETLKKKLLDQEAK 980 

Query: 918 LSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKALGPSRTESTQREKV 973 

L K L + + L Q+ K+ ++ E M N+ + A+ SR E+ Q+E++ 
SbjCt: 981 L- — KKELENTALELSQKEKQFNAKMLE— MAQAHSAGISDAV—SRLETNQKEQI 1029 

Score - 318 (47.7 bits). Expect - 4.4e-24, P - 4.4e-24 
Identities - 184/827 (22%), Positives « 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLS LQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ + ++ L++ ++ + D Q 

SbjCt: 1323 LQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q +++ EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSISLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIA 172 

E K+ + H +KE ++ L + + ++ E+++L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD--EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNIEL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRIYTSPC 230 

L+ + + K + SN+E L+ Q + EL + Q+ E + + ++ Y 
Sbjct: 1497 CLKGEHEDDKSKMEKKESNLETELKSQTARIM-ELEDHITQKTIEIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVHQKVSC^DDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH+E ++L + ++D+ ++E K+ L LE + +K + + 

SbjCt: 1556 DI-EHKELVQKLQHFQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + Lt + ++ +++++++ +L + E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQKEEK EE 1664 

Query: 350 MMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKL--TLKKDKFLQEKD 407 

K + H E + ++ +++++ IL+ +L+ +♦ +ET + + K E++ 
Sbjct: 1665 QYKKGTESH--LSELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEHTVK - EAKQDKS KE 4 55 

E +Q+ E+K++ +Q +L +KEK L EK++ +++ EM + + + K + 

Sbjct: 1723 EADSQGCVQKTY E EK I SVLQRNLT EKEKL LQRVGQEK E ET VS S H FEMRCQYQERL I K L EH 1782 

Query: 456 AECKAL— QAEVQKLKNSLEEAKQQERLAAQQAAQCK— EEAALAGCHLEDTQRKLQKGL 511 
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AG K Q+ + L+ LEE ♦+ L Q + + + A +LE+ +QK L 
Sbjct: 1783 AEAKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEKEGGKMIQAKQNLENVFDDVQKTL 1842 

Query: 512 LLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELS--LELSEALRKLENSDKE S69 

++K T Q L+++++ L +S + +++ +R +EEL+ E +AL++++ +K 
SbjCt: 1843 — QEKELTCQILEQKIKEL--DSCLVRQKEV-HRVEMEELTSKYEKLQALQQMOGRNKP 1896 

Query: 570 KRQLQKTVAEQD— MKMNDMLORIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 

L++ E+ + +L ++ OH + E + Q+ K + ++ L+ 

SbjCt: 1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDLRML 1956 

Query: 626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENBNLRAELQCCSTQLESSLNKYNT 685 

KEH++ ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

SbjCt: 1957 RKEHQQ ELEILKKEYDQ EREEKIKQEQEDL--ELKHNST-LKQLMREFNT 2003 

Query: 686 S-QQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 744 

Q Q+L I ++A+L ++ Q+E +L IEDLR+A ++ 

SbjCt: 2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

Query: 745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMK--KLNTELRKLRGFH 801 

+ A E+ +T++ + LQ L + Q+K Q LE+E + + +L T+L + 

SbjCt: 2062 ILDAREE-- EMT AK VRDLQTQLEELQKK YQQKLEQEEN PGN DNVT I MELQTQLAQKTT L I 2119 

Query: 802 QESELEVHAFDKKLEEMSCQVLQWQK 827 

+S+L+ F +++ + ++ +++K 
Sbjct: 2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 

Score - 316 (47-4 bits), Expect - 7.1e-24, P - 7.1e-24 
Identities - 213/977 (21%), Positives - 454/977 (46%) 

Query: 4 EAGERD-REVSSLNSKLLSLQLD-IKNLHDVCKRQRKTLQONQLCMEEAMNSSHDKKQAQ 61 

E R+ +V S+ K L+ Q + ++ +H++ +QK++L++ +++ 
Sbjct: 1034 E VH RRELND V I S I WEKK LNQQAEE LQE I H E I -QLQEKEQEVAEL KQK I LLFGC EK EEMNK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE--LEFHTEELQTSYYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L+ + 

Sbjct: 1093 EITWLKEE— GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLNKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENTGEKLHLAQEQLALAGDKI 171 

+ L++Q +L +L K K E+ + +E +++ EK + + E +L K+ 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKL 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS— NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTS 228 

«■ L L++ K ++ L EL+ L I +++ K + 

SbjCt: 1210 SEELAIQLD1CCKKTEALLEAKTNELINISSSKTNAILSRI--SHCQHRTTKVKEALLIK 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + E Q L+ +Q+ + Q ++++ A +LV E+E L 
Sbjct: 1268 TCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKENQIKSMKADIESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVS 340 

Q++ + SECI++KLE++LEE +K+ +VE+ ++S 
Sbjct: 1324 QKEGGN QQQAASEKESC--ITQLKKELSENINAVTLMKEE LKEKKVEISSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKD1TILQCRLQEL--QLEFTETQKLT-L 397 

+Q ++ ♦ + L S+ ++ D++ L ++Q+L +++ +K++ L 

Sbjct: 1374 KQLTDLNVQLQN-SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKISAL 1432 

Query: 398 KK-DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV KEAKQDKS 4 53 

++ D + + E ++ + + TQ QN++ + + +LE + A E + + KE ++ 
Sbjct: 1433 EQVDDWSNKESEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQN 14 92 

Query: 454 KEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLE-DTQRKLQKGLL 512 

K +C + E K K +E+ + L +Q A + E + +£++++ K 
SbjCt: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

+ +QK +EL ++LQ Q+ + +++ L ++ +LE KE 

SbjCt: 1552 -HQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDRI KHQHREQ-GS I KCKLEEDLQEATKLL EDKREQLKKSK 627 

+ +V ++ ++ + DR++ + + +K K E+ + K L E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGT 1670 

Query: 628 EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE +++ L+E + +E ++E L A+ T+ E + ++ 

Sbjct: 1671 £S H L S ELNT KLQEREREVH I LEEKLKS VES SQS ETL I VPRS AKN VAA YTEQEEADSQ 1727 

Query: 683 YNTSQQVIQDLNKEIALQKESLHSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 7 39 

T ++ I L + + +KE L+ Q +K H+ +E L A 

SbjCt: 1728 GCVQKTYEEKISVLQRNLT-EKEKLLQRVGQ- EKEETVSSHFEMRCQYQERLIKLEHAEA 1785 
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Query: 740 ACQDDLTQALEKLNHVTSET--KSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKL 797 

+D Q+ + + H+ E K+ + SL Q + + + I ++ ++ + +++K 
Sbjct: 1786 KQHED--QSM--IGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKT 1841 

Query: 798 RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

SbjCt: 1842 L---QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKYEKLQALQQMDGRNKPT 1897 

Query: 857 -LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E PK + ++ + L A+++K +KLG ++ + 
Sbjct: 1898 ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK QKLGKEIVRLQKDL 1953 

Query: 916 AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

SbjCt: 1954 RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNST--LKQLMREFNTQLAQKEQ 2010 

Score - 301 (45.2 bits), Expect - 2.9e-22, P - 2.9e-22 
Identities - 221/952 (23%), Positives - 441/952 (46%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CMEEAMNSSHD- 56 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 
Sbjct: 1160 LKHLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 1219 

Query: 57 — KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT EELQTSYY 109 

KK L + +E + SSK L ++ + + +++ L T EL+ 
Sbjct: 1220 CCKKTEALLEAKTNELIN1SSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLR 1279 

Query; 110 SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 

Sbjct: 1280 QLT EEQNT LN I S FQQAT HQLEEKENQI KSMKADI ESLVTEKEALQKEGGWQQQA 1333 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

A +K E + + + +++ + L++ ++K + E+ + Q + V++ 
Sbjct: 1334 ASEK ESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTD LNVQLQ 1384 

Query: 227 TSPCHIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFAS 286 

S + ++ + ++ + D +Q+L K+ + L E+ AL ++ D+++ 

SbjCt: 1385 NSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKV— DTLSKEKISALEQVD-DWSN 14 40 

Query: 287 CTATHRYPPSS — SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+ + S ++ +K++ L E K + +E NL+K+ R + L+ 

SbjCt: 1441 KFSEWKKKAQSRFTQHQNTVKELQIQL-ELKSKEAYEKDEQINLLKEELDQQNKRFDCLK 1499 

Query: 339 AVSEQKRNIM-KDMMKLELDLHGLRE---ETSAHIERKDKDITILQCRLQEL-QLEFTET 392 

E ++ M K LE +L E HI +K +1 L L+ Q + E 

SbjCt: 1500 GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 1559 

Query: 393 QKLTLKKDK FLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD ++E E+K+ ++N + + ELE ++ + ++VK 
Sbjct: 1560 KELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVK 1616 

Query: 450 QDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQK 509 

SKE E KAL+ ++ S + + +R A Q+ A K++ +E+ + + +K 

SbjCt: 1617 SKEEELKALEDRLES--ESAAKLAELKRKAEQKIAAIKKQLL SQMEEKEEQYKK 1668 

Query: 510 GLLLDKQKADT-IQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 

G + +T +QE +RE+ +L+++ EQ+ + S+ A+E+D 

SbjCt: 1669 GTESHLSELNTKLQEREREVHILEEKLKSVESSQSETL--IVPRSAKNVAAYTEQEEADS 1726 

Query: 569 E KRQLQK-T VAEQDMKMN D-MLDRI KHQHREQGS I KCKLEEDLQEAT KLLE DK REQ 622 

+ K +K +V ++++ + +L R+ Q +E+ ++ E Q +L+ K E 
SbjCt: 1727 QGCVQKTYEEKISVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI— KLEH 1782 

Query: 623 LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ +K+HE + M G L E L :+ KK + ++ K E N++A+ LE 
Sbjct: 1783 AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QMLE 1832 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKAL— QKEKHYLQTTITKEAYDALSR-K 7 37 

N ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

Sbjct: 1833 NVFDDVQKTLQE--KELTCQ--ILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQ 1888 

Query: 738 SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMKKLNTEL-- 794 

++ T+ LE+ S++ +Q L E + LE ++ E +KL E+ 

Sbjct: 1889 QHDGRNKPTELLEENTEEKSKSHLVQPKLLSKMEAQHNDLEFKLAGAEREKQKLGKEIVR 1948 

Query: 795 --RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAA 852 

+ LR +E + E+- K+ ++ + ++ Q+Q +LK + ++ +REF ++A 
Sbjct: 1949 LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLHREFNTQLAQ 2007 

Query: 853 LKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQV 912 
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++ L KE Q V + + Q TN Q K K+A EK + R 
Sbjct: 2008 KEQELEMTIKETINKAQ-EVEAELLESH OEETN— QLLK--KIA-EKDDDLKRTAK 2057 

Query; 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKEIjEEKKMKAEN 952 

Y L ++ + + + LQ + ++L+K+ ++K + EN 
Sbjct: 2058 RYEEILDAREEEMTAKVRDLQTQLEELQKKYQQKLEQEEN 2097 

Score - 300 (45.0 bits), Expect - 3.7e-22, P - 3.7e-22 
Identities » 195/961 (20%) , Positives - 435/961 (45%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKN-- LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 

+KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

Sbjct: 657 LKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLRQYQSI 117 

+E E + K H +0+ + K+ V fQ+ + +++ L++ 
Sbjct: 715 DKMK QELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

Query: 118 LEKQTSOLVLLHHHCKLKEDEVILYEEEMG— NHHENTGEKLHLAQEQLALAGDKIASL 174 

L++ + + L K E E+ ++ ++ T E+ +EQLA K+ L 

Sbjct: 772 LKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDL 631 

Query: 175 ERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQ-EPENKGDHSKVRIYTSPCMIQ 233 

E L + + + + ++ + ++ +M Q E +N KV+ T 

Sbjct: 832 ETERILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQ-VYES 890 

Query: 234 EHQETQKRLSEVWQKVSQQDOLIQELRN KLACSNALVLEREKALIKLQADFASCTA 289 

+ ++ K + Q + +++++I ++R ++ + +E ++ L ++ + 

Sbjct: 891 KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDS1HILNEEYET— 947 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

++ + ++ e +K+ K +QE + L E L K+L +S++++ 
Sbjct: 948 — KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA— KLKKELENTALELSQKEKQFNAK 1002 

Query: 350 HMKL-ELDLHGLREETSA-HIERKDKDITILQCRtQELQLEFTETQKLTLKKDKFLQEKD 407 

M+++ + + G+ + S +K++ ++ + +EL + +K ++ + LQE 

Sbjct: 1003 MLEMAQANSAGISDAVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIH 1062 

Query: 408 EM-LQELEKKLTQVQNSLLK KEKELEKQQCMATE LEMTVKEAKQD-KSKEAEC 458 

E+ LQE E+++ +++ +L +++E+ K+ E + T+ E ++ K K A 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 4 59 KALQAEVQKLKN S LEEAKQQERLAAQQAAQC KE EAALAGC HLEDTQRK LQKG L LL D KQKA 518 

+L + KLK LE+ + + ++ +E+ E+ +RK+ + L K K 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE--LTSKLKT 1180 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSHRKRVEELSLELSEALRKLEHSDKEKRQLQKTVA 578 

T +E Q +K + E + +K EEL+++L +-K E + K + + 

Sbjct: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN--ELIN 1237 

Query: 579 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + + QL++ E + + 

Sbjct: 1238 IS5SKTNAILSRISHCQHRTT KVKEALLIKTCTVSELEAQLRQLTEEQNTLNISF 1292 

Query: 638 EALRQEFKKKD— KTLKENSRKLEEENENLR AELQCCSTQLESSL 680 

+ + ++K+ K++K + L E E L+ +E + C TQL+ L 

Sbjct: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENI 1352 

Query: 681 KKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK +++ K+ YD + 

Sbjct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 74 0 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEI IAYEERMKKLWTELR-KL 797 

DL+ ++ L+ S + + + E K + + ++ +K+L +L K 

Sbjct: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 798 RGFHQESELEVHAFDKKLEEHSCQVLQWQKQHQHDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +.+ + g +++ + + + + ++D + KE L E+ + A + E 

Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 857 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIHTNLEQWAKQQKVANEKLGKQLREQVNYIA 916 

LED + + T + N+ ++ N Q QK K +L +++ + 
Sbjct: 1530 -LEDH ITQKTIEIESLHE-VLKNYNQ QKDIEKK---ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

+L EKD+ ++ L+ + +K E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score • 298 (44.7 bits), Expect - 6.1e-22, P - 6.1e-22 
Identities - 207/886 (23%), Positives ■ 412/886 (46%) 
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Query: 47 MEEAMWSSHDKKQAQALAFEESEVEFGSSKQCHLRQI^LKKKLLVLQQELEFHTEELQT 106 

+ EN++ Q EEE+SK ++L + LQ+E + 

Sbjct: 1281 LTEEQNTLNISFQQATHQLEEKENQIKSMKA D I ES LVT EKEALQKEGGNQQQAAS E 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ +++ + N + L++++ A 

Sbjct: 1337 KESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIY 226 

I+SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWK-K 1447 

Query: 227 TS PCM I QEHQETQKRLS EVWQKVSQQDDLIQEL — RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLBTELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI--TILQCRLQEL 3BS 

LV+ L+ + + E+K N +K+ + L L A +E K K++ L + +E 

Sbjct: 1562 LVQKLQ-HFQEI£EEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+L+ E + L+ + + E+ ++ E+K+ + + LL + +E E+Q TE ++ 
Sbjct: 1621 ELKALEDR---LESES-AAKUAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGTESHLSE 1676 

Query: 446 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

K + +E E L+ +++ +++S E R A AA + EEA GC + + 

Sbjct: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLE 564 

K+ +L + + + LQR Q +KE +++ + R + +E ++L A K 
SbjCt: 1736 EKIS---VLQRNLTEKEKLLQRVGQ--EKEETVSSHFEM--RCQYQERLIKLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG--SIKCK--LE EDLQ E 611 

LQ+ + E++ K + ++ +H +E G +1+ K LE +D+Q E 
Sbjct: 1789 EDQSMIGKLQEELEEKNKKYSLIV--AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 1846 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T ++LE K ++L +K + E+E L +++K + + R +L EEN 

Sbjct: 184 7 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMDGRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLM-KYNTSQQVIQDLHKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L+KE H + 

SbjCt: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDL-RMLRKE-HQQEL 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYE 784 

I K+ YD R+ Q+ + LE L H ++ + +++ TQ +K+ +LE I + 
Sbjct: 1965 EILKKEYDQ-EREEKIKQEQ--EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E + K +L HQE E + KK+ E + + K+++ ++L A+EE++ 
SbjCt: 2018 ETINKAQEVEAELLESHQE ETNQLLKKIAEKDDDLKRTAKRYE EILDAREEEMT 2071 

Query: 845 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

++ EL+++ LQ PD + ++TLQK +++ K 

SbjCt: 2072 AK V RDLQTQLE ELQKK YQQK- - LEQEEN PGN DNVT I M ELQTQLAQ — KTT L I S DSK 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ++ + +L + ++++ V HL 
SbjCt: 2124 LKEQEFREQIHNLEDRLKKYEKNVYATTVGHL 2155 

Score - 2B0 (42.0 bits). Expect - 5.2e-20, P - 5.2e-20 
Identities - 209/938 (22%), Positives - 432/938 (46%) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKM-LHDVC-KRQRKTLQDNQLCMEEArt-NSSHDKK 58 

+ + ++ +E+ +L KLL + +K L + + +K Q N +E A NS+ 
SbjCt: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

L + E + S + H R+L + + + +++L EELQ + ++ + 
Sbjct: 1017 AVSRLETNQKE-QI ESLTEVHRRELNDV ISIWEKKLNQQAEELQ-EIHEIQLQEK— 1069 

Query: 119 EKQTSDLV—LLHHHCKLKE-DEVILYEEEMGNHNEWTGEKLHLAQEQLALAGDK1ASLE 175 

E++ ++L +L C+ +E ++ I + +E G + T +L +Q + + +A E 
Sbjct: 1070 EQEVAELKQK1LLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHVNSLAQDE 1129 

Query: 17 6 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI--MGQEPENKGDHSKVRIYTSPCMIQ 233 
L ++K+ LN LE LQ +L + + +E + K ++ T+ Q 
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Sbjct: 1130 TKLKAHLEKLEVDL-NKSLKENT--FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

Query: 234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL--AC--SNALVLEREKALIKLQADFA 285 

H+++ K L + K + 1* +EL +L C + AL+ + LI + + 
Sbjct: 1187 SLKSSHEKSNKSLED— KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

Query: 286 SCTATH-RYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 

+ + + + + ++I + ++Q + E QN + + E+K 

Sbjct: 1244 NAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKE 1303 

Query: 345 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 

N +K M K +++ L +E ++ + +■+ +L+ E +E +TL K+ + 

Sbjct: 1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

Query: 403 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEKTVKEAKQDKSKEAECKALQ 462 

L+EK + L K+LT + N L+ L +++ + L EK+ ++ L 

SbjCt: 1362 LKEKKVEISSLSKQLTDL-NVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQ — DLS 1418 

Query: 4 63 AEVQK LKN S LE EAKQQERLAAQQAAQC KEEAALAGC H LE DTQRKLQKGLLLDKQKA 518 

+V L A +Q + + ++ K++A ++T ++LQ L L ++A 

SbjCt: 1419 FKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD 1478 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

SbjCt: 1479 EQ1NLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLET— ELKSQTARIMELBDHIT 1535 

Query: 57 9 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEOKREQLKKSKEHEKLMEGEL 637 

++ +++ + + +K+ + +q 1+ K L + LQ +L E+K ++K+++E +E ++ 
SbjCt: 1536 QKTIEIESLNEVLKN-YNQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQV 1594 

Query: 638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 

+++ E + KKL+ + ++ + E L+A L+ +LES S K ++ + ++ 

SbjCt: 1595 YSMKAELETKKKELEHVNLSVKSKEEELKA-LE-— DRLESESAAKL AELKRKAEQK 1647 

Query: 697 IALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 756 

IA K+ L+S Q++ +KE+ Y + T + L+ K ♦ ++ EKL V 

SbjCt: 1648 1AAIKKQLLS— QME— EKEEQYKKGT— ESHLSELNTKLQEREREVHILEEKLKSVE 1699 

Query: 757 S ET KSLQQSLTQTQEKKAQLEEEI I - AYEERMKKLNTELRKLRGFHQESELEV 808 

S ET +S + T++++A + + YEE++ L L E E + 

Sbjct: 1700 SSQSETLI VPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

Query: 809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 

+ + EE + + Q+Q L L E + E Q + L+E L E +K+ + 

Sbjct: 1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSL1V 1812 

Query: 869 QWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEK-LGNQLREQ-VNYIAKLSGEKDHL 925 

V K+ + N Q NLE + QK EK L Q+ EQ + + + > 

SbjCt: 1813 AQHVEKEGGK— NNIQAKQNLENVFDDVQKTLQEKELTCQILEQKIKELDSCLVRQKEV 1869 

Query: 926 HSV-KVHLQQEHKKLK 940 

H V M L + +KL+ 
SbjCt: 1870 HRVEMEELTSKYEKLQ 1685 

Score - 227 (34.1 bits). Expect » 2.5e-14, P - 2.5e-14 
Identities - 160/716 (22%), Positives = 318/716 (44%) 

Query: 233 QEHQETQKRLSEVWQKVSQQDDLIQE-LRNKLACSNALV-LEREKALIKL-QADFASCTA 289 

♦E +TQ ++ +V + L + ++ L S++ LR + L+D STA 
SbjCt: 53 RESGDTQS FAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTA 112 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

+ p E ED+ L +++ QL ++ R+ + ++++ 

SbjCt: 113 SFDPPSDMDSEAEDLVGNSDSLNKEQLIQRLR— RMERSLSSYRGKYSELVTAYQMLQRE 170 

Query: 350 MMKLELDLHGLREETSAHIERKDKDIT-ILQCRLQELQLEFTETQKLTLKKDKFLQEKDE 408 

KL+ G+ ++ +DK + I + R +ELQ++ + L + D L+EKD+ 

SbjCt: 171 KKKLQ GILSQS QDKSLRR2AELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAE V 465 

+ L+ +++ ++ L ++ + + +LE + ++++ E++ + + + V 

Sbjct: 220 YISVLQTQVSLLKQRLRNGPMNVDVLKPLP-QLEPQAEVFTKEENPESDGEPVVEDGTSV 278 

Query: 466 QKLKNSLEEAKQQERLA--AQQAAQC-KEEAALAGCHLEDTQRKLQKGLL-LDKQKADTI 521 

+ L+ + K+QE L ++ Q KE+ L E Q +L + L L+K K + 

Sbjct: 279 KTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIKDLHM 338 

Query: 522 QELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQD 581 

E++L+ ++E++ + E ++EL E +R K+Q 

Sbjct: 339 AEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMHETLEMKEEEI AQLRSRIKQMTTQG 398 
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Query: 


582 


Sbjct: 


399 


Query: 


637 


Sbjct: 


459 


Query: 


€96 


Sbjct: 


513 


Query: 


755 


Sbjct: 


569 


Query: 


813 


Sbjct: 


625 


Query: 


863 


Sbjct: 


682 


Query: 


923 


Sbjct: 


740 


Score 


- 183 


Identities • 


Query: 


409 


Sbjct: 


1 


Query: 


.468 


Sbjct: 


61 


Query: 


525 


Sbjct: 


121 


Query: 


580 


Sbjct: 


181 


Query: 


633 


Sbjct: 


241 


Query: 


689 


Sbjct: 


301 


Query: 


749 


Sbjct: 


358 


Query: 


808 


Sbjct: 


410 


Query: 


864 


Sbjct: 


466 


Query: 


918 


Sbjct: 


525 



MKMNDMLDRIKHQHREQGSIKCKLEEDLQEAT-KLLEDKREQLK— KSKEHEKL-MEGE 636 

+ + + ++ + E+ + +ea KL + EQ+K K+ E E++ ++ E 

EELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQE 458 

LEALRQEFKK-KDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNK 695 
L ++QE K+ + E KL++ +E EL +L L T ++ Q+ K 

LSRVKQEVVDVMKKSSEEQIAKLQKLHEK— ELARKEQELTKKLQ— TREREFQEQMK 512 

EXALQKESLMS LQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLN-H 754 

+AL+K L+ +K Q+ + + K+A S DL Q E 

-VALEKSQS EYLKI SQEKEQQES LALEELELQKKAI LTESENKLR DLQQEAETYRTR 568 

VTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEV--HAFD 812 
+ SL++SL QE K Q ++ + E K N E+ + H+ +ELE H D 
ILELESSLEKSL QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHQQD 624 

KKL EEMS CQV LQWQKQHQN DLKMLAAKEEQLRE FQEEMAALKENLLED-DK 862 

E QVL+ +Q+Q +++ L K EQ +E FQ + + E LE D 

ALWTE-KLQVLK- -QQYQTEMEKLREKCEQEKETLLKDKEI I FQAHI EEMNEKTLEKLDV 68 1 

EPCCLPQWSVPKDTCRLYRGNDQIKTNLEQWAKQQKVANEKLGNQLREQVNYIAKLSGEK 922 
+ LS++++++L Q ++L ++ EQ N+ + 

KQTELE— SLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI 739 

DHLHSVMVHLQQENKKLKKEIEEKKM 948 

H V + Q+ K LK +1 + ++ 
IKEHEVSI--QRTEKALKDQINQLEL 763 

(27.5 bits), Expect - 1.3e-09, P - 1.3e-09 
• 132/584 (22%), Positives - 251/584 (42%) 

MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK-QDKSKEAECKALQAEVQK 4 67 
M ++L++K+++ Q L + + +T M + + ++ E +Q 

MFKKLKQKIS EEQQQLQQALAPAQAS SNSSTPTRMRSRTSS FTEQLDEGTPN RESG DTQS 60 

LKNSLE-EAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA— DTIQEL 524 

L+ EL + ++ + + R+ L LD AD ++ 

FAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTASFDPPSDM 120 

QRELQMLQKESSMAEKEQTSNRKRVEELSL ELSEALRKLENSDKEKRQLQKTVAE 579 

E + L S KEQ R R E SL + SE + + +EK++LQ +++ 

DSEAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQ 180 

-QDMKMNDMLDRIKHQHREQGSIKCKLEE— DLQEATK— LLEDKREQLKKSKEHEKL 632 

QD + + + + +Q + K EE L+E + +L+ + LK+ + + 
SQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQVSLLKQRLRNGPM 240 

MEGELEALRQ-EFKKKDKTLKENSRKLEE— ENENLRAELQCCSTQLESSLNKYNTSQQ 638 

L+ L Q E + + T +EN E E+ L+ +++ N ++ 

NVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKE 300 

VIQDLNKEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQA 748 

IQ ++ h +LQ QLD+ LQ E ++ E +++ A +L + 

TIQSHKEQCTLLTSEKEALQEQLDERLQ-ELEKIKDLHMAEKTKLITQLRDA--KNLIEQ 357 

LEK-LNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELE 807 
LE+ V +ETK + + +T E K EEEI R+K++ T + +LR Q+ + E 

LEQDKGHVIAETK— RQMHETLEMK— EEEIAQLRSRIKQMTTQGEELR— EQKEKSE 409 

VHAFDKKLEEMSCQVLQWQKQHQHDLKMLAAKEEQLREFQ EEMAALKENLLEDDKE 863 

AF EE+ + QK + K+A +EQ++ + EE +L++ L +E 
RAAF EELEKALSTAQKTEEARRKLKAEMDEQI KTIEKTSEEERISLQQELSRVKQE 465 

PCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEKLGNQLR EQVNYIAK 917 

+ + S + +L + +++ + EQ K+ + + Q++ Q Y+ K 

VVDVMKKSSEEQIAKLQKLHEKELARKEQELTKKLQTREREFQEQMK VALEKSQS EYL-K 524 

LSGEKOHLHSVMVH-LQQENKKLKKEIEEK KMKAENTRLCTKALGPSRTESTQREK 972 

+S EK+ S++ L+ + K+ EEK + +AE R L S +S Q K 

ISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILELESSLEKSLQENK 584 
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{LENGTH] 

(MWJ 

(pi) 



1007 
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5.90 
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[HOMOLJ 
0.0 

[FUNCAT] 

[FUNCAT] 

Se-15 

[FUNCAT] 

[FUNCAT J 

[FUNCAT J 

[ FUNCAT J 

[FUNCAT) 

repair) 

( FUNCAT ] 

[FUNCAT] 



TREMBL:AF092090_1 product: "cplSl"; Rattus norvegicus cplSl mRNA, partial eds. 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-15 
08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

09-10 nuclear biogenesis [S. cerevisiae, YDR356w] le-11 

30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] le-11 
03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-11 
30.10 nuclear organization [S. cerevisiae, YKR095w] le-08 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] le-08 

99 unclassified proteins (S, cerevisiae, YLR309c] le-08 

1 genome replication, transcription, recombination and repair [M. 

jannaschii, MJ1322] 4e-06 

( FUNCAT ] 09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] 9e-06 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
MYOl - myosin-1 isoform] 3e-04 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

myosin-1 isoform] 3e-04 

( FUNCAT} 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-04 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJR134c) 5e-04 

[EC] 3.6.1.32 Myosin ATPase le-16 

[PIRKW] nucleus 3e-10 

[PIRKW] phosphotransferase 6e-09 

[PIRKW] duplication 2e-06 

(PIRKW] citrulline 2e-12 

[PIRKW] tandem repeat le-16 

[PIRKW] endocytosis 2e-13 

[PIRKW] heart 8e-13 

[PIRKW] transmembrane protein le-13 

[PIRKW] serine/threonine-specific protein kinase 6e-09 

[PIRKWJ zinc finger 2e-13 

[PIRKW] metal binding 2e-13 

[PIRKW] DNA binding 4e-12 

[PIRKW] muscle contraction le-16 

[PIRKW] acetylated amino end le-11 

[PIRKW) actin binding le-16 

[PIRKW] mitosis 5e-15 

( PIRKW] microtubule binding 5e-15 

[PIRKW] ATP le-16 

[PIRKW] thick filament le-16 

[PIRKW] phosphoprotein 4e-16 

[PIRKW] skeletal muscle 2e-14 

[PIRKW] calcium binding 2e-l2 

[PIRKW) alternative splicing le-16 

[PIRKWJ coiled coil le-16 

[PIRKWJ P-loop le-16 

[PIRKW) heptad repeat 3e-10 

[PIRKW) methylated amino acid le-16 

[ PIRKW) immunoglobulin receptor 2e-06 

[PIRKW) peripheral membrane protein 2e-13 

[PIRKW] cardiac muscle 8e-13 

[PIRKW] hydrolase le-16 

[PIRKW] microtubule 3e-10 

[ PIRKW] muscle 6e-13 

[PIRKW] EF hand 2e-l2 

[PIRKW] cytoskeleton 2e-15 

[PIRKW] hair 2e-12 

[PIRKW] calmodulin binding 2e-13 

[PIRKW] Golgi apparatus 3e-10 

[SUPFAMJ myosin heavy chain le-16 

[SUPFAMJ conserved hypothetical P115 protein le-07 

[SUPFAMJ centromere protein E 5e-15 

[SUPFAMJ unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 

[SUPFAMJ calmodulin repeat homology 2e-12 

[SUPFAMJ myosin motor domain homology le-16 

[SUPFAMJ alpha-actinin actin-binding domain homology 2e-07 

[SUPFAM] plectin 2e-07 

[SUPFAMJ trichohyalin 2e-12 

[SUPFAMJ pleckstrin repeat homology 8e-08 

[SUPFAMJ ribosomal protein S10 homology 2e-07 

[SUPFAM] giantin 3e-13 

[ SUPFAM] protein kinase homology 6e-09 

[SUPFAMJ protein kinase C zinc-binding repeat homology 8e-08 

[SUPFAMJ kinesin motor domain homology 5e-15 

(SUPFAMJ human early endosome antigen 1 2e-13 

(SUPFAMJ MS protein le-07 

[PROSITEJ LEUCINE ZIPPER 7 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITEJ CK2_PHOSPHO_SITE 20 
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[PROSITE] TYR PHOSPHO_SITE 1 

[PROSITE] PKC~PHOSPHO SITE 16 

( PROSITE] ASN_GLYCOSYLATION 2 

[KW] All Alpha 

(KW] LOW'COMPLEXITY 15.00 % 

[KW] COILED_COIL 42.40 % 

SEQ MKDEAGERDREVSSLNSKLLSI^LDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQA 

SEG xxxxxxxxxxxx. . . 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS ccccccccccccccccccccccccccccc 

SEQ QALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccccccccccccccccccccccc 

SEQ YRDKYQSSLSNIELLECQVKKLQGELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coils cccccccccccccccccccc 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ REETS AH I ERKDK D I T I LQCRLQELQLE FT ETQKLT LKK DK FLQEK DEMLQE LE KKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCC CCCCCCCCCCCCCC 

SEQ QNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG . . .xxxxxxxxxx xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCC 

SEQ EQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQDMKMNDMLORIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ IAYEERMKKLMTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRCNDQIHTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 

SEQ NEKLGNQLREQVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKABNTRLCTKAL 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ GPSRTESTQREKVCGTLGWKGLPQDMGQRMDLTKYIGMPHCPGSSYC 

SEG 

PRO cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 

COILS 



Prositc for DKFZphtes3_lgl3. 1 



PSO0001 


52->56 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


684 


->688 


ASN*~GLYCOSYLATION 


PDOC00001 


PS00004 


240 


->244 


CAMP PHOSPHO SITE 


PDOC00004 




415 


->419 


CAMP - PHOSPHO - SITE 


PDOC00004 




74->77 


PKC PHOSPHO SITE 


PDOC00005 




110 


->113 


PKC _ PHOSPHORS I TE 


PDOC00005 


PSOOOOS 


238 


->24l 


PKC~PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


290 


->293 


PKC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


392 


->395 


PKC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


396 


->399 


PKC PHOSPHO SITE 


PDOC0000S 


PSOOOOS 


444 


->447 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


503 


->506 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


544 


->547 


PKC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


566 


->569 


PKC PHOSPHO~SITE 


PDOC0000S 


PSOOOOS 


600 


->603 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


650 


->653 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


655 


->658 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


735 


->738 


PKC*~PHOSPHO~SITE 


PDOC0000S 


PS00005 


876 


->879 


PKC~PHOSPHO~SITE 


POOC00005 


PSOOOOS 


968 


->971 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


39->43 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


53->57 


CK2~PHOSPHO SITE 


PDOC00006 


PSOOOOS 


68->72 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


116 


->120 


CK2~PHOSPHO~SITE 


PDOC00006 


PSOOOOS 


190 


->194 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


250 


->254 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


296 


->300 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


439 


->443 


CK2 PHOSPHO~SITE 


PDOC00006 


PSOOOOS 


444 


->448 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


471 


->475 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


520 


->524 


CK2 PHOSPHO~SITE 


PDOC00006 


PSOOOOS 


536 


->540 


CK2 PHOSPHO SITE 


POOC00006 


PS00006 


566 


->570 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


576 


->580 


CK2~PHOSPHO SITE 


PDOC00006 


PSOOOOS 


650 


->654 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


674 


->67B 


CK2 PHOSPHO~SITE 


P DOC 000 06 


PSOOOOS 


804 


->808 


CK2 PHOSPHO SITE 


PDOC00006 


PS00OO6 


888 


->892 


CK2~PHOSPHO SITE 


PDOC00006 


PSOOOOS 


963 


->967 


CK2 PHOSPHO SITE 


PDOC00006 


PSOOOOS 


968 


->972 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00007 


135 


->143 


TYR PHOSPHO SITE 


PDOC00007 


PSOOOOS 


207 


->213 


MYRISTYL 


PDOC00008 


PS00OO8 


599 


->605 


MYRISTYli 


PDOC00008 


PS00029 


83 


->105 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


90 


->112 


leucine'zipper 


PDOC00029 


PS00029 


97 


->119 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


104 


->126 


LEUCINE~ZIPPER 


PDOC00029 


PS00029 


403 


->425 


LEUCINE ZIPPER 


PDCC00029 


PS00029 


410 


->432 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


918 


->940 


LEUCINE ZIPPER 


PDOC00029 



(No Pfara data available for DKfZphtes3_lgl3 . 1) 



DKFZphtes3_lkll 



group: cell structure and motility 
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DKFZphtes3_lkll encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-bindlng protein (ENC-1). 

Ectoderm-neural cortex-1 protein (ENC-1 > is an early and highly specific mar Jeer of neural 
induction in vertebrates. The protein is related to the kelch family proteins and is expressed 
during early gastrulation in the prospective neuroectodermal region of the epiblast and later 
in development throughout the nervous system (NS) . ENC-1 functions as an actin-binding protein 
organising the actin cytoakeleton during neural differentiation and development of the NS. 
The novel protein is highly similar to ENC-1. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 



strong similarity to mouse ENC-1 

complete cDNA, compete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3525 bp 

Poly A stretch at pos. 3515, polyadenylation signal at pos . 3499 



1 GGTGCAGAGC CGGCCGACGG GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 

51 GGGCTGCCGG GAGTGGTCTC TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 

101 CGGCACTGGC GCACCATGTC GGTCAGTGTC CATGAGACCC GCAAGTCGCG 

151 GAGCAGCACG GGGTCCATGA ACGTCACCCT CTTCCACAAG GCCTCCCACC 

201 CGGACTGTGT GCTGGCCCAC CTCAACACGC TTCGCAAGCA CTGCATGTTC 

251 ACCGACGTCA CACTCTGGGC GGGCGACCGT GCCTTCCCCT GTCACCGTGC 

301 CGTGCTGGCC GCCTCTAGCC GCTATTTTGA GGCCATGTTC AGCCATGGCC 

351 TTCGGGAGAG CCGGGATGAC ACTGTCAACT TCCAGGACAA CCTGCACCCG 

4 01 GAGGTGCTGG AGCTGCTGCT GGACTTTGCC TACTCCTCAC GCATCGCCAT 

451 CAACGAGGAG AACGCTGAGT CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 

501 TCCACGATGT GCGGGATGCT GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 

551 CCCTCCAACT GCCTGGGCAT GATGCTGCTC TCGGACGCCC ACCAGTGCCG 

601 CCGGCTGTAT GAGTTCTCCT GGCGCATGTG CCTGGTGCAC TTTGAGACGC 

651 TGAGGCAGAG CGAGGACTTC AACAGCCTGT CCAAGGACAC ACTGCTGGAC 

701 CTCATCTCGA GTGATGAGCT GGAGACCGAG GACGAGCGGG TGGTCTTCGA 

751 GGCCATCCTC CAGTGGGTGA AGCACGACCT GGAGCCACGG AAGGTCCACT 

801 TGCCCGAGCT CCTCCGCAGC GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 

851 CTGCAGGAGG CCGTCTCCAG CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 

901 CAAGCTTATC ATGGATGAGG CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 

951 ATGATGGCGT GGTCACCAGC CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 

1001 ACGCTACTCA TCCTGGGGGG CCAGACCTTC ATGTGTGACA AGATCTACCA 

1051 GGTGGACCAC AAGGCCAAGG AGATCATCCC CAAGGCCGAC CTGCCCAGCC 

1101 CCCGGAAGGA GTTCAGCGCC TCAGCGATCG GCTGCAAGGT CTATGTGACG 

1151 GGGGGCAGGG GCTCCGAGAA CGGGGTCTCC AAGGATGTCT GGGTGTACGA 

1201 CACCGTACAT GAGGAATGGT CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 

1251 TTGGCCATGG CTCAGCTGAG CTGGAGAACT GCCTCTATGT GGTGGGGGGA 

1301 CACACATCCC TGGCAGGGGT CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 

1351 ACAAGTGGAG AAATACGACC CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 

1401 CCTTGCGGGA TGGCGTCAGC AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 

1451 CTCTTTGTTT TCGGAGGAAC CAGCATCCAC CGGGACATGG TGTCCAAGGT 

1501 CCAGTGCTAT GACCCCTCGG AGAACAGGTG GACGATCAAG GCCGAGTGCC 

1551 CCCAGCCTTG GCGGTACACA GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 

1601 ATCATGGGAG GTGACACGGA ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 

1651 CTGTGAGACC AACCAGTGGA CGCGGATTGG GGACATGACT GCCAAGCGCA 

1701 TGTCCTGCCA TGCCCTGGCT TCCGGCAACA AGCTCTATGT GGTCGGGGGC 

1751 TACTTTGGGA CCCAGAGGTG TAAGACTCTG GACTGCTATG ACCCCACTTC 

1801 AGATACATGG AACTGCATCA CCACAGTGCC CTACTCACTT ATCCCCACGG 

1851 CCTTTGTCAG CACCTGGAAG CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 

1901 CCCAGCCAGA CCGCGGCCTT CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 

1951 CACAGCGGGA GCTAAGCCGG CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 

2001 GGCCCTGCCA GCTCTGGGGA GCAGCAGCCT TGGGCTGTTC TGAGCTTTAG 

2051 GCAAGAGAAG AGAAGCATCT CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 

2101 GCTTTGCAGT GGTTTGTGGG AAGACATACC TCCCAGAGGG GCATGGACTG 

2151 CCACCAGGAC TGACCCTGGC GTCGGGGAGA AGGACACTTG CAGAGCCTTG 

2201 AGATCACCTG TTTGGCAGGT CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 

2251 GGAGGCGCCC CGGGTGGGCT TTGGGGCTGC GGCACTGCCA CACATCCTTT 

2301 CCCTCCTGGC CTGCCCTGCT GGGGCTCTAC TGCCATCTAT AGATGGTGTC 

2351 CTGGGCCTGG GAAACTAGGT TCCCAGGGGT TGAGACCAGA AAGGTGACCA 

2401 AGACAGATTT TTTAAGGTGC AGAAACTGCA GGGGGGCCTC AGTGACATCC 

2451 ATGAGGCCTT ATTAGCAAAG GACACCCAGA CCTCCAAGGT TTGTGGGCCC 

2501 CTTCCACAAA GCTGTAAGTC CCAGCCCACC TACTCAGGGC CTTGCTCAGT 

2551 GCTGTGGCCC GGTGGGGACA CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 

2601 GCCTGCAGCA GACTCAAGGC TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 

2651 CCCCTCCTCA GAGCCCACCC TGAGAGGCAG CAGTGACCCC CATGGCACAC 

2701 ACCTGCCAAC AGCACTGGGG GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 
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2751 AAGACCAGGA GCAGCTGTGA GCTGGAGACA GCAGRGGGAC CCCAGGGTGT 

2801 CCCCTGCAGA TCCCACCAGG GCCGCATCCA TCTCAGTGTG GAGGACAGTG 

2851 ACGGGACCCT CACCATCCTC TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 

2901 CTCCAAGATA AGAATGGCCC CGAGAGAACT GCTGAACATT TGTTCATTGC 

2951 TGTCACCTCC TGAGTCACTG GGGTCCCTCA CCAGCACCTC CCTGACACCT 

3001 GGGCTATGGA GAGGTTGGCG CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 

3051 TCACTCCCAA GCCACCATTT GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 

3101 GACTGGGCAG GGTGTCCAAA CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 

3151 ACACTGCTTG GCTGGCTCAA GATTAGGGCC GCGGAGGGGG CTGTGCACAT 

3201 ACCAGTTACT TAAGCAGCCA CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 

3251 TGGAGGCCTC TTGGGGGTGG GACCTTTGGG CAGGGTTTGC CCACTGACGC 

3301 GCCCGCCATG GGGCACTGGC TGCATGGGGC TCCTTGGACC CTGTAGAGCC 

3351 AGCAGCAGCC TGGCCGCGGG GACTGCAGGG AGGGTGCCTG GACCCGTGGG 

3401 GTTGCTTCAT TGAGATAAAG CACACTTATC ACATAGCACA AAGGACGTGC 

3451 CATGGTGCTT TCCCCAAAAG TTGTGTTGCT TTTATCAGTT TTCTAACTTA 

3501 ATAAAAAGAG TTGAGAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647: 

ENC-1: a novel mammalian Jcelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/B, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal dif f erentiati 



Peptide information for frame 2 



ORF from 116 bp to 1882 bp; peptide length: 589 
Category: strong similarity to known protein 
Classification: Cell structure/motility 



1 HSVSVHETRK SRSSTGSMNV TLFHKASHPD CVLAHLNTLR KHCMFTDVTL 

51 WAGDRAFPCH RAVLAASSRY FEAMFSHGLR ESRDDTVNFQ DNLHPEVLEL 

101 LLDFAYSSRI AINEENAESL LEAGDMLQFH DVRDAAAEFL EKNLFPSNCL 

151 GMMLLSDAHQ CRRLYEFSWR MCLVHFETVR QSEDFNSLSK DTLLDLISSD 

201 ELETEDERW FEAILQWVKH DLEPRKVHLP ELLRSVRLAL LPSDCLQEAV 

251 SSEALLMADE RTKLIMDEAL RCKTRILQND GVVTSPCARP RKAGHTLLIL 

301 GGQTFMCDKI YQVDHKAKEI IPKADLPSPR KEFSASAIGC KVYVTGGRGS 

351 ENGVSKDVWV YDTVHEEWSK AAPMLIARFG HGSAELENCL YVVGGHTSLA 

401 GVFPASPSVS LKQVEKYDPG ANKWMMVAPL RDGVSNAAVV SAKLKLFVFG 

451 GTSIHRDMVS KVQCYDPSEN RWTIKAECPQ PWRYTAAAVL GSQIFIMGGD 

501 TEFTAASAYR FDCETNQWTR IGDMTAKRMS CHALASGNKL YWGGYFGTQ 

551 RCKTLDCYDP TSDTWNCITT VPYSLIPTAF VSTWKHLPA 

BLASTP hits 

Entry MMU65079_1 from database TREMBL: 

gene: "ENC-1" ; "product : "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds. 

Score - 2402, P » 1.9e-249, identities - 440/589, positives - 513/589 
Entry AF059611_1 from database TREMBLNEW: 

gene: "NRPB"; product: "nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score - 2400, P - 3.0e-249, identities - 440/5B9, positives - 512/589 

Entry AF010314 1 from database TREMBL: 

gene: "PIG10" ; "product : "PiglO"; Homo sapiens PiglO (PIG10) mRNA, 
complete cds. 

Score » 1745, P - 7.8e-180, identities - 335/507, positives - 403/507 
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Entry KELC_DROME from database SWISSPROT: 

RING CANAL PROTEIN (KELCH PROTEIN). >TREHBL : DMRCPA_1 product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and ORF2 
mRNA, complete cds. 

Score - 672, P - 3.9e-66, identities - 168/536, positives - 257/536 



Alert BLASTP hits for DKFZphtes3_l)cll , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lkll, frame 2 

Report for DKFZphtes3_lkll . 2 

[LENGTH] 589 

[MW] 65923.45 

[pi) 6.10 

[HOMOL) TREMBL:MMU6507 9_1 gene: "ENC-l"; product: "actin-binding protein"; Mus musculus 

actin-binding protein (ENC-1) mRNA, complete cds. 0.0 

IFUNCAT] 10.05.99 other pheromone response activities (S. cerevisiae, YHRlSSc] 

2e-09 

( BLOCKS] BL01016D Glycoprotease family proteins 

IPIRKM] zinc finger le-08 

(PIRKW) DNA binding le-08 

[PIRKWJ transcription factor le-08 

(SUPFAM) POZ domain homology 3e-68 

[SUPFAM) vaccinia virus 59K Hindlll-C protein le-15 

[SUPFAM) A55R protein 5e-29 

[SUPFAM] hypothetical protein YHR158c 4e-08 

[SUPFAM) A55R protein middle region homology 5e-29 

[SUPFAM] myxoma virus M9-R protein le-14 

[SUPFAM] A55R protein carboxyl-terminal homology 5e-29 

[KW] Alpha_Beta 

SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 
PRD cccccccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RAVLAASSRYFEAMFSHGLRESRDDTVNFQDNLHPEVLELLLDFAYSSRIAINEENAESL 
PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFH DVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 
PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh . 

SEQ QSEDFNSLSKDTLLDLISSDELETEDERVVFEAILQWVKHDLEPRKVHLPELLRSVRLAL 
PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIMDEALRCKTRILQNDGVVTSPCARPRKAGHTLLIL 
PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQT FMC DK I YQVDH KAKE 1 1 PKADL PS P RKEFSASA I GC KV YVTGGRGS ENG VSKDVWV 
PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPMLIARFGHGSAELENCLYWGGHTSLAGVFPASPSVSLKQVEKYDPG 
PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

SEQ ANKWMMVAPLRDGVSNAAVVSAKLKLFVFGGTSIHRDMVSKVQCYDPSENRWTIKAECPQ 
PRD ccceeeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ PWRYTAAAVLGSQI FIMGGDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 

PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLIPTAFVSTWKHLPA 
PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc 



(No Prosite data available for DKFZphtes3_Ull . 2) 
(No Pfam data available for DKFZphtes3_lkl 1 . 2 ) 
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DKFZphtes3_ln3 



group: signal transduction 

DKFEphtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein. ~ 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a RGD site is present. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to Tuplp 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map-"6q24" 
Insert length: 5277 bp 

Poly A stretch at pos. 5267, polyadenylation signal at pos . 5244 



1 GCTGCATAAA GCTGAGAGAT GCCTACAGCT GAGAGTGAAG CAAAAGTAAA 
51 AACCAAAGTT CGCTTTGAAA AATTGCTTAA GACCCACAGT GATCTAATGC 
101 GTGAAAAGAA AAAACTGAAG AAAAAACTTG TCAGGTCTGA AGAAAACATC 
151 TCACCTGACA CTATTAGAAG CAATCTTCAC TATATGAAAG AAACTACAAG 
201 TGATGATCCC GACACTATTA GAAGCAATCT TCCCCATATT AAAGAAACTA 
251 CAAGTGATGA TGTAAGTGCT GCTAACACTA ACAACCTGAA GAAGAGCACG 
301 AGAGTCACTA AAAACAAATT GAGGAACACA CAGTTAGCAA CTGAAAATCC 
351 TAATGGTGAT GCTAGTGTAG AGGAAGACAA ACAAGGAAAG CCAAATAAAA 
401 AGGTGATAAA GACGGTGCCC CAGTTGACTA CACAAGACCT GAAACCGGAA 
451 ACTCCTGAGA ATAAGGTTGA TTCTACACAC CAGAAAACAC ATACAAAGCC 
501 ACAGCCAGGC GTTGATCATC AGAAAAGTGA GAAGGCAAAT GAGGGAAGAG 
551 AAGAGACTGA TTTAGAAGAG GATGAAGAAT TGATGCAAGC ATATCAGTGC 
601 CATGTAACTG AAGAAATGGC AAAGGAGATT AAGAGGAAAA TAAGAAAGAA 
651 ACTGAAAGAA CAGTTGACTT ACTTTCCCTC AGATACTTTA TTCCATGATG 
701 ACAAACTAAG CAGTGAAAAA AGGAAAAAGA AAAAGGAAGT TCCAGTCTTC 
751 TCTAAAGCTG AAACAAGTAC ATTGACCATC TCTGGTGACA CAGTTGAAGG 
601 TGAACAAAAG AAAGAATCTT CAGTTAGATC AGTTTCTTCA GATTCTCATC 
851 AAGATGATGA AATAAGCTCA ATGGAACAAA GCACAGAAGA CAGCATGCAA 
901 GATGATACAA AACCTAAACC AAAAAAAACA AAAAAGAAGA CTAAAGCAGT 
951 TGCAGATAAT AATGAAGATG TTGATGGTGA TGGTGTTCAT GAAATAACAA 
1001 GCCGAGATAG CCCGGTTTAT CCCAAATGTT TGCTTGATGA TGACCTTGTC 
1051 TTGGGAGTTT ACATTCACCG AACTGATAGA CTTAAGTCAG ATTTTATGAT 
1101 TTCTCACCCA ATGGTAAAAA TTCATGTGGT TGATGAGCAT ACTGGTCAAT 
1151 ATGTCAAGAA AGATGATAGT GGACGGCCTG TTTCATCTTA CTATGAAAAA 
1201 GAGAATGTGG ATTATATTCT TCCTATTATG ACCCAGCCAT ATGATTTTAA 
1251 ACAGTTAAAA TCAAGACTTC CAGAGTGGGA AGAACAAATT GTATTTAATG 
1301 AAAATTTTCC CTATTTGCTT CGAGGCTCTG ATGAGAGTCC TAAAGTCATC 
1351 CTGTTCTTTG AGATTCTTGA TTTCTTAAGC GTGGATGAAA TTAAGAATAA 
1401 TTCTGAGGTT CAAAACCAAG AATGTGGCTT TCGGAAAATT GCCTGGGCAT 
1451 TTCTTAAGCT TCTGGGAGCC AATGGAAATG CAAACATCAA CTCAAAACTT 
1501 CGCTTGCAGC TATATTACCC ACCTACTAAG CCTCGATCCC CATTAAGTGT 
1551 TGTTGAGGCA TTTGAATGGT GGTCAAAATG TCCAAGAAAT CATTACCCAT 
1601 CAACACTGTA CGTAACTGTA AGAGGACTGA AAGTTCCAGA CTGTATAAAG 
1651 CCATCTTACC GCTCTATGAT GGCTCTTCAG GAGGAAAAAG GTAAACCAGT 
1701 GCATTGTGAA CGTCACCATG AGTCAAGCTC AGTAGACACA GAACCTGGAT 
1751 TAGAAGAGTC AAAGGAAGTA ATAAAGTGGA AACGACTCCC TGGGCAGGCT 
IB 01 TGCCGTATCC CAAACAAACA CCTCTTCTCA CTAAATGCAG GAGAACGAGG 
1851 ATGTTTTTGT CTTGATTTCT CCCACAATGG AAGAATATTA GCAGCAGCTT 
1901 GTGCCAGCCG GGATGGATAT CCAATTATTT TATATGAAAT TCCTTCTGGA 
1951 CGTTTCATGA GAGAATTGTG TGGCCACCTC AATATCATTT ATGATCTTTC 
2001 CTGGTCAAAA GATGATCACT ACATCCTTAC TTCATCATCT GATGGCACTG 
2051 CCAGGATATG GAAAAATGAA ATAAACAATA CAAATACTTT CAGAGTTTTA 
2101 CCTCATCCTT CTTTTGTTTA CACGGCTAAA TTCCATCCAG CTGTAAGAGA 
2151 GCTAGTAGTT ACAGGATGCT ATGATTCCAT GATACGGATA TGGAAAGTTG 
2201 AGATGAGAGA AGATTCTGCC ATATTGGTCC GACAGTTTGA TGTTCACAAA 
2251 AGTTTTATCA ACTCACTTTG TTTTGATACT GAAGGTCATC ATATGTATTC 
2301 AGGAGATTGT ACAGGGGTGA TTGTTGTTTG GAATACCTAT GTCAAGATTA 
2351 ATGATTTGGA ACATTCAGTG CACCACTGGA CTATAAATAA GGAAATTAAA 
2401 GAAACTGAGT TTAAGGGAAT TCCAATAAGT TATTTGGAGA TTCATCCCAA 
24 51 TGGAAAACGT TTGTTAATCC ATACCAAAGA CAGTACTTTG AGAATTATGG 
2501 ATCTCCGGAT ATTAGTAGCA AGGAAGTTTG TAGGAGCAGC AAATTATCGG 
2551 GAGAAGATTC ATAGTACTTT GACTCCATGT GGGACTTTTC TGTTTGCTGG 
2601 AAGTGAGGAT GGTATAGTGT ATGTTTGGAA CCCAGAAACA GGAGAACAAG 
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2651 TAGCCATGTA TTCTGACTTG CCATTCAAGT CACCCATTCG AGACATTTCT 
2701 TATCATCCAT TTGAAAATAT GGTTGCATTC TGTGCATTTG GGCAAAATGA 
2751 GCCAATTCTT CTGTATATTT ACGATTTCCA TGTTGCCCAG CAGGAGGCTG 
2801 AAATGTTCAA ACGCTACAAT GGAACATTTC CATTACCTGG AATACACCAA 
2851 AGTCAAGATG CCCTATGTAC CTGTCCAAAA CTACCCCATC AAGGCTCTTT 
2901 TCAGATTGAT GAATTTGTCC ACACTGAAAG TTCTTCAACG AAGATGCAGC 
2951 TAGTAAAACA GAGGCTTGAA ACTGTCACAG AGGTGATACG TTCCTGTGCT 
3001 GCAAAAGTCA ACAAAAATCT CTCATTTACT TCACCACCAG CAGTTTCCTC 
3051 ACAACAGTCT AAGTTAAAGC AGTCAAACAT GCTGACCGCT CAAGAGATTC 
3101 TACATCAGTT TGGTTTCACT CAGACCGGGA TTATCAGCAT AGAAAGAAAG 
3151 CCTTGTAACC ATCAGGTAGA TACAGCACCA ACGGTAGTGG CTCTTTATGA 
3201 CTACACAGCG AATCGATCAG ATGAACTAAC CATCCATCGC GGAGACATTA 
3251 TCCGAGTGTT TTTCAAAGAT AATGAAGACT GGTGGTATGG CAGCATAGGA 
3301 AAGGGACAGG AAGGTTATTT TCCAGCTAAT CATGTGGCTA GTGAAACACT 
3351 GTATCAAGAA CTGCCTCCTG AGATAAAGGA GCGATCCCCT CCTTTAAGCC 
3401 CTGAGGAAAA AACTAAAATA GAAAAATCTC CAGCTCCTCA AAAGCAATCA 
3451 ATCAATAAGA ACAAGTCCCA GGACTTCAGA CTAGGCTCAG AATCTATGAC 
3501 ACATTCTGAA ATGAGAAAAG AACAGAGCCA TGAGGACCAA GGACACATAA 
3551 TGGATACACG GATGAGGAAG AACAAGCAAG CAGGCAGAAA AGTCACTCTA 
3601 ATAGAGTAAA GAATTGAAGA AAAGTTAAGA GCTGCCGAAA TGCACAGAGG 
3651 TGAAAATGAC AAACCAAATG GAATTTCTCT TCAGAGTTCA GAATTTTCAG 
3701 ATACTAAGGA GGAAGAAAGG ATCCACTACT TCTTGTTCTT ATGAATGACT 
3751 CTAGAAAAAT CAGAATCAAG TTGTGGGTGG AAAAATCAAC GTGGCCTTTG 
3801 AGTTCAGTTG TTATAAACCA TTGTGACTAT TGTTGGTCAA AGTATTGGTA 
3851 CTTATATTGT TAGTAATTGC ATCATAATTA CATTACCAGT GTTGGAAAAC 
3901 TAATGAAGAA AACACTGTAA TTGCTACTCA GCAAATGTGA ATAAAAGGTG 
3951 TTTGCGTTAT TAGGATGTCT GTTAAGTAAT CATTTAATAT TATTATATTG 
4001 GTAATGGTTG TATGTGTGAT GCTATGCCCA GAATATGAAG TATCTGTTTT 
4051 TGAAATTCAC TTTATTTAAA AGATAAGCAG CTGACTGGGC ACGGTGCCTC 
4101 ATGCCTGTAA TCCTAGCACC TTGGGAGGCT GAGGCAGGTG GATCACCTAA 
4151 GGTCAGGAGT TCAACAACAC CAGCCTGACC AACATGGTGA AACCCCATCT 
4201 CTACTAAAAA TACAAAAATC AGCCGGGTCT CATGGCAGGC ACCTGTAATC 
4251 CCATCTACTG AGGCAGGAGA ATTGCTTGAC CCAGGAGGCA GAGGTTGCAG 
4301 TGAGCCAAGA TCACGCCATT GCACTCCAGC CTGGGGGACA GAGCAAGACT 
4351 CTATCTCCAA AAAACAAAAA AGATAAGCAG CTTTAGAATA TGGCGCATTC 
4401 AAAACAGTCT CAGTAACAAA GACATTAAAA GAAAACAATT TACTTTCTAA 
4451 TTAAAATTTT GTGTTTCTTA AGATCAAATC ATATAGGTAA CTTCATAGAC 
4501 CTAAATTAAA AGTGATTTTT GGCTGGACTG GCAACAATGT TCCCAATGTC 
4551 TTTACTTTTT AAAAAAGGCT TTTCATATTT AAGCACATAC CTATTTTGTA 
4601 GACTTACATT GTTTAATATT TATTTTAATC TTAATATTTT TACATTATTA 
4651 TATTGCATTA TTTATTTTTT CTAAGTTCCA GAATAATAGT GTCATTATTA 
4701 TAGACTATAT GTTTTGAAGT TTGATATTAT AATGGGATAT TCATTTTTTG 
4751 TTCTTTTCTT GACTCCTTTC TCAAGTGTGT GATAAGGTCT GCTGATAAAA 
4801 TATTTAACCC CAAGAAAGTG AAAACTAATA TAAAATTAGA AAGACCTATC 
4851 CAAATTAGAC AGTCAATTCC ATTAAAATAA GAAGTGAGAA AAACAATGTT 
4901 GGGCATTGAG GTGTAAATTT TGCCCAGATG TATACCCAGT GTGAAATATC 
4951 TTCT AA T AAA AATATATTTG GCTCTTATCC CTGCACATGT AGAGGCATAA 
5001 AAATTGGTAA ACATGTCCCG CTGTGTAGAA CTTTAAAAAA AAGGCATTTT 
5051 TGAAAGTGTT GAGTGGCACT GATAACTGGT GAAGCCTACA GCCATCCGCC 
5101 CAAAAGTCTG TTCTGATGGC ACTGAGTTTT CATTGTTCTG GATGTATAAG 
5151 TCTGTGTGTC AGGTACAGCT GGGCCCAGCC AGCTTGAGTC ACTCTTGTAC 
5201 AAGCTTGTTT TTTTCTGTCT TGTGAATGCA CTTGATAATT TAAAAATAAA 
5251 AATATCTGTT TCTCTGCAAA AAAAAAA 



BLAST Results 



Entry HS32B1 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 32B1 
Score » 4445, P - 0.0e+00, identities - 889/889 

Entry U93816 from database EMBL: 

Human exon-trapped sequence from 6q24 . 

Score - 965, P - 4.0e-35, identities - 193/193 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 MPTAESEAKV KTKVRFEKLL KTHSDLMREK KKLKKKLVRS EENISPDTIR 

51 SNLHYMKETT SDDPDTIRSN LPHIKETTSD DVSAANTNNL KKSTRVTKNK 

101 LRNTQLATEN PNGDASVEED KQGKPNKKVI KTVPQLTTQD LKPETPENKV 

151 DSTHQKTHTK PQPGVDHQKS EKANEGREET DLEEDEELMQ AYQCHVTEEM 

201 AKEIKRKIRK KLKEQLTYFP SDTLFHDDKL SSEKRKKKKE VPVFSKAETS 

251 TLTISGDTVE GEQKKESSVR SVSSDSHQDD EISSKEQSTE DSMQDDTKPK 

301 PKKTKKKTKA VADNNEDVDG DGVHEITSRD SPVYPKCLLD DDLVLGVYIH 

351 RTDRLKSDFM ISHPMVKIHV VDEHTGQYVK KDDSGRPVSS YYEKENVDYI 

401 LPIMTOPYDF KQLKSRLPEW EEQIVFNENF PYLLRGSDES PKVILFFEIL 

451 DFLSVDE1KN NSEVQNQECG FRKIAWAFLK LLGANGNANI NSKLRLQLYY 

501 PPTKPRSPLS VVEAFEWWSK CPRNHYPSTL YVTVRGLKVP DCIKPSYRSH 

551 MALQEEKGKP VHCERHHESS SVDTEPGLEE SKEVIKWKRL PGQACRIPNK 

601 HLFSLNAGER GCFCLDFSHN GRILAAACAS RDGYP1ILYE I PSGRFMREL 

651 CGHl.NI I YDL SWSKDDHYIL TSSSDGTARI WKNEINNTNT FRVLPHPSFV 

701 YTAKFH PAVR ELVVTGCYDS MIRIWKVEMR EDSAILVRQF DVHKSFINSL 

751 CFDTEGHHMY SGDCTGVIW WNTYVKINDL EHSVHHWTIN KEIKETEFKG 

801 IPISYLEIHP NGKRLLIHTK DSTLR1MDLR ILVARKFVGA ANYREKIHST 

851 LTPCGTFLFA GSEDGIVYVW NPETGEQVAM YSDLPFKSPI RDISYHPFEN 

901 MVAFCAFGQN EPILLYIYDF HVAQQEAEMF KRYNGTFPLP GIHQSQDALC 

951 TCPKLPHQGS FQIDEFVHTE SSSTKMQLVK QRLETVTEVI RSCAAKVNKN 

1001 LSFTSPPAVS SQQSKLKQSN MLTAQEILHQ FGFTQTGIIS I ERKPCNHQV 

1051 DTAPTVVALY DYTANRSDEL TIHRGDIIRV FFKDNEDWWY GSIGKGQEGY 

1101 FPANHVASET LYQELPPEIK ERSPPLSPEE KTKIEKSPAP QKQSINKNKS 

1151 QDFRLGSESM THSEMRKEQS HEDQGHIMDT RMRKNKQAGR KVTLIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_ln3, frame 1 

TREMBL:U92792_1 gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
pombe general transcriptional repressor Tupl (tupl) mRNA, complete 
cds., N - 1, Score » 186, P * le-10 

TREMBL:AF104258_1 gene: "Pmc733 H ; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35,6 kDa 
protein (Pmc733) mRNA, complete cds., N - 1, Score - 235, P - 4.6e-18 

TREMBL:SPAC3H5_8 gene: "SPAC3H5 . 08c"; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5., N « 2, Score » 231, P - 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N - 
2, Score - 228, P - le-13 

TREMBL:AF10425B_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N - 1, Score » 235, P - 4.6e-18 

TREMBL:SPAC3H5_8 gene: "SPAC3H5 . 08c"; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5 . , N » 2, Score - 231, P - 2e-14 

TREMBL:CER03E1_1 gene: "R03E1.1"; Caenorhabditis elegans cosmid R03E1, 
N - 1, Score - 215, P = 2.3e-13 

SWISSPROT:YZLL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 

PROTEIN K04G11.4 IN CHROMOSOME X., N - 1, Score - 203, P - 7. le-13 



>TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733> mRNA, complete cds. 
Length » 321 

HSPs: 

Score = 235 (35.3 bits), Expect » 4.6e-18, P - 4.6e-18 
Identities - 59/225 (26%), Positives - 111/225 (49%) 

Query: 647 MRELCGHLNIIYDLSWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFH 706 

+ E GH + I DLSWSK+ +L++S D T R+W ++ + +V H ++V +F+ 
Sbjct: 63 VHEFYGHGDAILDLSWSKNGD-LLSASMDKTVRLW— QVGRDSCLKVFSHTNYVTCVQFN 119 

Query: 707 PA V REL VVTGC Y DSMI RI WKVEMRE DSA I L VRQFDVHKS FI NS LC FDT EGHHMY S GDCTG 766 

P +TGC D ++RIW V LV + K + ++C+ +G +G TG 

Sbjct: 120 PTNGN Y F I TGC I DGLVRI W DVRK C LVV DWAN S KE I VTA VC YRPDGKGAV AGT I TG 174 

Query: 767 VIVVWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRI 826 
++ +LE V ++N K +4 Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV— SLNGRKKSLHKRIVGFQYCPSDP--KKLMVTSGDAQVRI 229 

Query: 827 HDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGIVYVWN 871 

+D +++ +G+ +++TPG++ S+D +Y+WN 
Sbjct: 230 LDGAHVISN-YKGLQS-SSQVARSFTPDGDHIVSASDDSRIYMWN 272 



Pedant information for DKFZphtes3_ln3, frame 1 



Report for DKFZphtes3_ln3 . 1 

( LENGTH] 1196 

[MW] 137114.70 

[pi] 6.79 

(HOMOLJ SWISSPROT: YKY4_CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 
C14B1.4 IN CHROMOSOME III. 8e~21 

( FUN CAT J 99 unclassified proteins [S. cerevisiae, YKLl21w] 2e-ll 

( FUNCAT ] 04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

TAF90 - TFIID subunit] 4e-10 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YBRl98c TAF90 - TFIID subunit) 
4e-10 

[ FUNCAT) 06.10 assembly of protein complexes [S. cerevisiae, YPR178w] le-08 

[ FUNCAT) 04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w) le-06 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YDR364cJ 4e-08 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YDR364c] 4e-08 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) {S. cerevisiae, YDL145c] 

9e-08 

[FUNCAT] 30.09 organization of intracellular transport vesicles (S. cerevisiae, 

YDL145C] 9e-08 

[FUNCAT] 04.05.01.04 transcriptional control (S. cerevisiae, YCR084c] 2e-07 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YHL002w] 7e-07 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 

[FUNCAT] 02.16 fermentation [S. cerevisiae, YMR116c] 4e-06 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YMR116c] 4e-06 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 
YHRU6C] 4e-06 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YFL009w] 4e-0b 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 
4e-05 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] 4e-05 

[ FUNCAT] 03.01 cell growth [S. cerevisiae, YCR088w] 6e-05 

(FUNCAT) 03.25 cytokinesis [S. cerevisiae, YCR057c) 7e-05 

[BLOCKS] BL00024H 

[SCOP] dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-91 

(SCOP] dlgfc 2.21.2.1.9 Growth factor receptor-bound protein 2 (CRB2), N 4e-14 

[SCOP] dlfmk_l 2.21.2.1.8 (1-64) c-src tyrosine kinase [human (Horn 5e-15 

(SCOP] dlad5bl 2.21.2.1.7 (1-63) Heraapoetic cell kinase Hck [human (Horn 3e-15 

[SCOP] dllckal 2.21.2.1.16 {1-54 ) p56-lck tyrosine kinase, SH3 domain [huma le-13 

[SCOP] dlqwea_ 2.21.2.1.15 Src kinase, SH3 domain [Avian sarcoma virus 2e-15 

[SCOP] dlshg 2.21.2.1.6 alpha-Spectrin, SH3 domain (chicken (Gallu 2e-13 

[SCOP] dlprmc 2.21.2.1.13 Src kinase, SH3 domain [chicken (Gallus gallus) 2e-15 

[SCOP] dlhsqJI 2.21.2.1.12 Phospholipase C, SH3 domain [human (Horn 2e-13 

[SCOP) dlaboa_ 2.21.2.1.3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 

[SCOP] dlefna_ 2.21.2.1.2 Fyn, SH3 domain [human (Homo sapiens) 2e-15 

(SCOP) dlsema_ 2.21.2.1.11 Growth factor receptor-bound protein 2 (GRB2), N le-13 

(SCOP) dlgbqa_ 2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2), N 3e-16 

(SCOP] dlckaa_ 2.21.2.1.1 C-Crk, N-terminal SH3 domain [mouse (Mu 3e-15 

(EC] 3.1.4.3 Phospholipase C 2e-07 

[EC] 3.1.4.11 l-Phosphatidylinositol-4, 5-bisphosphate phosphodiesterase 7e-07 

(EC] 3.6.1.32 Myosin ATPase 7e-07 

[EC] 2.7.1.112 Protein-tyrosine kinase 8e-06 

[PIRKWJ nucleus 2e-08 

[PIRKW] phosphotransferase 8e-06 

[PIRKWJ plasma 4e-07 

(PIRKW] duplication 4e-07 

[PIRKW] phosphoric diester hydrolase 2e-07 

[PIRKW] tandem repeat 7e-07 

[PIRKW] hormone 4e-07 

[PIRKW] transmembrane protein 2e-06 

[PIRKW] stomach 4e-07 

[PIRKW] actin binding 7e-07 

[PIRKW] ATP 7e-07 

[PIRKW] phosphoprotein 7e-07 

[PIRKW] signal transduction 7e-09 

[PIRKW] heterotrimer 7e-09 

[PIRKW] P-loop 7e-07 

[PIRKW] hydrolase 7e-07 

[PIRKW] transcription regulation 5e-06 

[PIRKW] GTP binding 7e-09 
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[SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase II 2e-07 

[SUPFAM] SH3 homology 2«-07 

I SUPFAM] SH2 homology 2e-07 

[SUPFAM] protozoan myosin heavy chain IB 7e-07 

[SOPFAMJ myosin motor domain homology 7e-07 

(SUPFAM] pleckstrin repeat homology 2e-07 

[SUPFAM] protein-tyrosine kinase src 8e-06 

[SUPFAM] WD repeat homology 3e-12 

[SUPFAM] l-phosphatidylinositol-4, S-bisphosphate phosphodiesterase domain r homology 2e 
07 

[SUPFAM] protein kinase homology 8e-06 

[SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain X homology 2e 
07 

[SUPFAM) GTP-binding regulatory protein beta chain 7e-09 

[SUPFAM) yeast coatomer complex alpha chain 4e-07 

[PROSITE] RGD 1 

(PROSITE] MYRISTYL 6 

[PROSITE] AMI DAT I ON 2 

[PROSITE] CAMP_PHOSPHO_SITE , 4 

(PROSITE] CK2_PHOSPHO SITE 25 

(PROSITE] TYR_PHOSPHO~SITE 4 

[PROSITE] PKC_PHOSPHO~SITE 19 

( PROSITE] ASN_G LYCOS If LAT ION 6 

(PFAHJ Src homology domain 3 

(PFAM) WD domain, G-beta repeats 

[KW] Irregular 

(KWJ 3D 

[KW] LOW COMPLEXITY 5.77 % 

[KW] COILED_COIL 2.42 * 



SEQ MPTAESEAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENISPDTIRSNLHYMKETT 

SEG xxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

IgotB 



SEQ SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 

SEG 

COILS 

IgotB 

SEQ KQGKPNKKVIKTVPQLTTQDLKPETPENKVDSTHQKTHTKPQPGVDHQKSEKANEGREET 

SEG xxx 

COILS 

IgotB 

SEQ DLEEDEELMQAYQCHVTEEMAKEIKRKIRKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 

SEG xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 

COILS 

IgotB 

SEQ VPVFSKAETSTLTISGDTVEGEQKKESSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 

SEG xxxxxxxxxx xxxx 

COILS 

IgotB 

SEQ PKKTKKKTKAVADNNEDVDGDGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 

SEG xxxxxxxxx 

COILS 

IgotB 

SEQ ISHPMVKIHWDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 

SEG 

COILS 

IgotB 

SEQ EEQIVFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNNSEVQNQECGFRKIAWAFLK 

SEG 

COILS 

IgotB 

SEQ LLGANGNANINSKLRLQLYYPPTKPRSPLSWEAFEWWSKCPRNHYPSTLYVTVRGLKVP 
SEG 

COILS 

IgotB 

SEQ DCIKPSYRSMMALQEEKGKPVHCERHHBSSSVDTEPGLEESKEVIKWKRLPGQACRIPHK 

SEG 

COILS 

IgotB 
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SEQ HLFSLNAGERGCFCLDFSHNGRILAAACASRDGYPIILYEI PSGRFMRELCGHLNIIYDL 

SEG 

COILS 



SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELVVTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT--TTCEEEEEETTTCEEEEEETTT-TCEEEEEETTT 

SEQ MIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVIVVWNTYVKINDL 

SEG 

COILS 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEEEEEE 

SEQ EHSVHHWTINKEIKETEFKGI PIS YLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGIVYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

S EQ MVAFCA FGQNEPI LLYI YD FH VAQQEAEMFKRYNGT FPLPG I HQSQDALCTC PKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESSSTKMQLVKQRLETVTEVIRSCAAKVNKNLSFTSPPAVSSQQSKLKQSN 

SEG 

COILS 

IgotB 

S EQ MLTAQE I LHQFGFTQTGI ISIERKPCNHQV DT A PTVVAL Y D YTANRS DELT I HRGD 1 1 RV 

SEG 

COILS 

IgotB 

SEQ FFKDKEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKIEKSPAP 

SEG 

COILS 

IgotB 

S EQ QKQS I N KN KSQDFRLGS ESMTHS EMRKEQSHEDQGH I MDT RMRKN KQAGRKVTLI E 

SEG 

COILS 

IgotB 



IgotB 



, CEEEEEECCCCCEEEE 



Prosite for DKF2phtes3_ln3. 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS0OO0S 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



I190->1194 



1000->1004 
1065->1069 
1148-M152 
91->95 



264->268 
305->309 



170->173 
232->235 
268->271 
304->307 
327->330 
352->355 
384->387 
440->443 
533->536 
546->549 
643->646 
677->680 
690->693 
702->705 



460->464 
686->690 
934->938 



48->51 
66->69 
93->96 



AS N_GL YCOS YL AT I ON 
AS N_GL YCOS Y LAT I ON 
ASN_GL YCOS YLAT I ON 
AS N_GL YCOS Y LAT I ON 
ASN GLYCOS YLAT ION 
AS N~GL YCOS Y LAT I ON 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO_SITE 
CAMP_PHOSPHO SITE 
CAMP PHOSPHO~S ITE 
PKC_?HOSPKO SITE 
PKC_PKOSPHO~SITE 
PKC PHOSPHO SITE 
PKC~PHOSPKO~SITE 
PKC_PHOSPHO~SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC~PH0SPH0"SITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 
PKC~PHOSPHO~SITE 
PKC~PHOSPHO~SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO SITE 



PDOC00001 
PDOC00001 
POOC00001 
PDOC00001 
PDOC00O01 
PDOC00001 
POOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
POOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
POOC00005 
PDOC00005 
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PS 00005 


823->826 


PKC_PH0SPHO_ 


SITE 


PDOC00005 


PS00005 


973->976 


PKC~PH0SPHO" 


"site 


PDOC00005 


PS00006 


22->26 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


59->63 


CK2 PHOSPHO" 


"site 


PDOC00006 


PSOO006 


77->61 


CK2~PH0SPH0* 


"site 


PDOcooooe 


psooooe 


116->120 


CK2 PH0SPHO* 


"site 


PDOC00006 


PS00006 


137->141 


CK2 PHOSPHO" 


"site 


PDOcooooe 


PS00006 


180->164 


CK2_PH0SPH0" 


"site 


PDOcooooe 


psooooe 


245->249 


CK2 PHOSPHO" 


"site 


PDOcooooe 


psooooe 


276->280 


CK2~PH0SPH0" 


'site 


PDOcooooe 


psooooe 


283->287 


CK2_PH0SPHO~ 


"site 


PDOC00006 


psooooe 


288->292 


CK2_PH0SPH0 - 


"site 


PDOcooooe 


psooooe 


292->296 


CK2 PHOSPHO* 


"site 


PDOcooooe 


PS00006 


327->331 


CK2 PHOSPHO* 


"site 


PDOcooooe 


psooooe 


390->394 


CK2 PHOSPHO" 


"site 


PDOcooooe 


PS00006 


454->458 


CK2 PHOSPHO* 


"site 


PDOcooooe 


psooooe 


510->514 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


S70->574 


CK2 PHOSPHO" 


"site 


poocooooe 


psooooe 


663->667 


CK2 PHOSPHO - 


"site 


PDOcooooe 


PS00006 


672->676 


CK2 PHOSPHO" 


"site 


poocooooe 


psooooe 


804->806 


CK2 PHOSPHO" 


"site 


pDocooooe 


psooooe 


9e5->989 


CK2~PH0SPH0" 


"site 


PDOC00006 


psooooe 


1023->1027 


CK2 PHOSPHO" 


"site 


PDocooooe 


psooooe 


1127-M131 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


1132->1136 


CK2 PHOSPHO" 


"site 


PDOC00006 


psooooe 


1161->1165 


CK2~PH0SPH0" 


"site 


PDOC00006 


psooooe 


1170->1174 


CK2 PHOSPHO* 


"site 


PDocooooe 


PS00007 


1063->1091 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


211->219 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


1083-M091 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


210->219 


TYR PHOSPHO" 


'site 


PDOC00007 


PS00008 


483->489 


MYRISTYL 




poocooooe 


psooooe 


577->583 


MYRISTYL 




P DOC 000 08 


psooooe 


716->722 


MYRISTYL 




PDOC00008 


psooooe 


800->606 


MYRISTYL 




PDOC00008 


psooooe 


861->867 


MYRISTYL 




PDOC00008 


psooooe 


941->947 


MYRISTYL 




pDocooooa 


PS00009 


811->815 


AMI DAT I ON 




PDOC00009 


PS00009 


1188->1192 


AM I DAT I ON 




PDOC00009 


PS00016 


1074->1077 


RGD 




poccoooie 



Pfaro for DKFZphtes3_ln3 . 1 



WD domain, G-beta repeats 

•MrCHnnWVWCVaFSPDGrWFIvSCSWDgTCRLWD* 
+ GH+N ++++++S D ++ I+++S DGT R+W 
650 LCGHLNIIYDLSWSKDDHY-ILTSSSDGTARIWK 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Src homology domain 3 

* py VI AL Y DYqAqdpDELS FkEGD 1 1 i 1 1 Eds DD . WW rg Rnnn TNGQEGW 
P+V+ALYDY+A+++DEL++ +GDII + ++++ WW+G GQEG+ 
1054 PTWALYDYTANRSDELTIHRGDIIRVFFKDNEDWWYGSIGK--GQEGY 

IPSNYVEPi* 
+P+N V+ + 
1101 FPANHVASE 1109 
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DKFZphtes3_20c21 



group: testes derived 

DKFZphtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

unknown 

Sequenced by MediGenomix 
Locus: /map-"22qll.2-12.2" 
Insert length: 3997 bp 

Poly A stretch at pos. 3877, polyadenylation signal at pos. 3853 

1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 

101 AGTCCCGAAC CCCGAGCACT CGGATGCCTG GCTACTCCGA GCCAAGGCAC 

151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 

201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 

251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 

301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 

351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 

401 AAATATACAT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 

451 ATGCTGAGTG AAAACTAACC AAGTGGCAGT TGTGACTGAA AACACTGAAA 

501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 

551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 

601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 

651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 

701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 

751 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 

801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 

851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 

901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 

951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
1251 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGATGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
1751 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC CTTCCTACAT GTGCCTGTTC 
1801 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CTCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTG CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG CAGTTCAGGC 
2151 TCAGCAAACT GTCAGGGTGC TGCCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2401 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
24 51 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGCAAAGCA GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
27 51 CCCAGGAGGT GACTGGGAAG GAGAAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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2851 TTTAGGAACC AAATGATATT TGAGTTTTTG TTATTCCTTT TGCAGATTGG 
2901 GATGTGTTTT GGGGGCAGGG GTTAGTTCTT CAGGTCGGCA GACCCAGAGC 
2951 ACTTGATAAA GAACTGTATT TAATCGGTAG TGTTGGGGCC GGGACGGGCT 
3001 TGGCTCCCTC TCTGCCATAC TGAGCCTGAG GTATTTCATA TCTCCTGCTG 
3051 TTCCATCCCA GCTTGAATTG GTGCCACAAG CTTCCAAGTT GGCATTTTTT 
3101 CTAGAACCTG ATCGTCCACT AGCCCAGAGT GTGTGTGTTC AACCCCCACA 
3151 CCAGGTGGTG GTAGGCGGTG TGACTGCACA GCGAGGTGCC GGATCTGTGA 
3201 GCAGGCCGAC TCCACTCCCA CGCCGCAGGT AGGTTTCTCC AGTGCGCTCT 
3251 TGCTGGGAGG TCCGGATCGT TCCTGCAGGG AAGCGGCAGC ACACGGAGAC 
3301 CACTTGGTTG AATTCTGTTG GAACTCTACT CAAATCTAGG GGCGTCTTCT 
3351 TTGGACCCAC AATGGGGGCA AGCCTTAATA ATATGGAAGG GAGTTTGGGC 
3401 TTTAGAGATC CCTTTATAAA AGCTCTGGGG GCTGAGCCCT GAGAATTCAG 
34 51 TGACAACAGG ACCAACCTGC GCTGCCTTTG ACTACAAGTG GGCCGTGCAG 
3501 CTGGTTCCTC TCGAGCGAGT GTCCCTAAAT AGGAGTTTAC AAGATGTCTG 
3551 GGGGTAAAAG CACTGTGCTT TTCAGTGGTG GCTGCGTGAA AGGGAGCGAC 
3601 ACTCAGCTGT GTGTTCCTGG GCTTGTGTGG TACTTAGAAC CTCAGTTCTA 
3651 TTACGTTATA GTCAGACATT TTTTTGACAG TATGAGACAG ACTGCAGGAT 
3701 GAAAATATTT GTCAAAATCT TAACTGAATG TTTACTGGAA GTACTTGAGA 
3751 TTCCATTTGA GAGTTGTATT GTTAATAATT TCATGTCAGT GAACTGATAT 
3801 CTGATGTTTA TGATATGGTG TCTTTTTCTT GAAACAAGCT TCCAAGGGCT 
3851 AGAAATAAAA TAGCCAAAAA ATGCTGGAAA AAAAAAAAAA AAAAAAAAAA 
3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS1048E9 from database EMBLNEW: 

Human DNA sequence from clone 1048E9 on chromosome 22qll.2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score - 6540, P - 0.0e+00, identities - 1308/1308 
-14 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 



1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 
51 ELLCGQIAGV VRCVSOISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 
101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 
151 SDLHKI FNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 
201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 
251 TKEEAISLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 
301 VESMAWTTPO PTSPDEACPD GRKENGCLSG HDLESIRPAG LHNSARGEVL 
351 GLSSSLGKEL VFLQEELDLS EIHIPEAQEV EMASGHFAFL HVPVPDGRAP 
401 YCKASLSASS SLEPT PPEDT AISSLRPPSA PEMLTQHGAQ EQVEDHPGHS 
4 51 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 
501 HAAPGLECSS GSANCQGAGP SADGISSRLT PAESCKGLVR MNLYTHCVKG 
551 LMLSLLAEEP LLGDSAAIEE VYHSSLASLN GLEVHLKETL PRDEAASTSS 
601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLHHSEF AQLPALYEMT 
651 VRNASTAVYA CCNP1QETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20c21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c21, frame 3 



Report for DKFZphtes3_20c21 . 3 
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[LENGTH] 708 

[MW] 76900.23 

[pi] 5.30 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.36 % 

SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTRAGICYFYPSQTLLDQQELLCGQIAGV 

SEG .XXXXXXXXXXXX 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCVSDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeecccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIBQILKNTSDLHKIFNSLWNLDQTKVEPLLLLKAARIL 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

S EQ PNVQI I PVFVTKE EA I S LHE FPVEQMTRS LAS PAG LQDGS AQHH PKGGST SALKENATGH 

SEG 

PRD ccceeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGRKENGCLSGHDLESIRPAGLHNSARGEVLGLSSSLGKEL 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccee«eeccccchhh 

SEQ VFLQEELDLSEIHIPEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AISSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPI PRADPLPRRTRRPLLLPRLDPGQRG 

SEG xxxxxxxxxxxxxxxxxxxxx. . , . 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLCDSA-MEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

S EQ T YN FTY YDRIQS LLMANL PQVATPHDRRFLQAVS LMH S EFAQL PAL YEMT VRNASTAV Y A 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhcccceeee 

SEQ CCNPI QETY FQQLA PAARS SGFPN PQDGAFS LSGKAKQKLLKHGVNLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc 

(No Prosite data available for DKFZphtes3_20c21 . 3) 
(No Pfam data available for DKF2phte33_20c21 . 3) 



692 



WO 01/12659 



PCTYIB00/O1496 



DKFZphtes3_201c2 



group: signal transduction 

DKFZphtes3_20k2 encodes a novel B39 amino acid protein with strong similarity to rat vanilloid 
receptor subtype 1. 

VR1 seems to play an important role in the activation and sensitization of nociceptors. It is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VB1 . 

The new protein can find application as a target for the development of new nociception- 
modulating drugs. 



strong similarity to rat vanilloid receptor subtype 1 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 4187 bp 

Poly A stretch at pos . 4154, polyadenylation signal at pos. 4135 



1 GGCTCAGGCA GGCCTGGCCC AGAGTCACGC TGGCAACCAC GAGTTTGGGA 
51 AGCAGTCGTA TTCTCTCTCT CTCTCTCTCT CTCTCAGTAT CCATGACAGT 
101 GTGATGGAGA GTCTCTGCCG TGCCATCTGG GATGCAAACC GTCCCTGTGT 
151 CCCCCACGTC CAGGCCGTAG ATGCTCCCCG CCGGTCAGTC ACTTAGTCGT 
201 CAGATCGCCC GTCCTGGTAT CACAGTGCTT CTGTTCAGGT TGCACACTGG 
251 GCCACAGAGG ATCCAGCAAG GATGAAGAAA TGGAGCAGCA CAGACTTGGG 
301 GGCAGCTGCG GACCCACTCC AAAAGGACAC CTGCCCAGAC CCCCTGGATG 
351 GAGACCCTAA CTCCAGGCCA CCTCCAGCCA AGCCCCAGCT CTCCACGGCC 
401 AAGAGCCGCA CCCGGCTCTT TGGGAAGGGT GACTCGGAGG AGGCTTTCCC 
4 51 GGTGGATTGC CCTCACGAGG AAGGTGAGCT GGACTCCTGC CCGACCATCA 
501 CAGTCAGCCC TGTTATCACC ATCCAGAGGC CAGGAGACGG CCCCACCGGT 
551 GCCAGGCTGC TGTCCCAGGA CTCTGTCGCC GCCAGCACCG AGAAGACCCT 
601 CAGGCTCTAT GATCGCAGGA GTATCTTTGA AGCCGTTGCT CAGAATAACT 
651 GCCAGCATCT GGAGAGCCTG CTGCTCTTCC TGCAGAAGAG CAAGAAGCAC 
"701 CTCACAGACA ACGAGTTCAA AGACCCTGAG ACAGGGAAGA CCTGTCTGCT 
"7 51 GAAAGCCATG CTCAACCTGC ATGACGGACA GAACACCACC ATCCCCCTGC 
801 TCCTGGAGAT CGCGCGGCAA ACGGACAGCC TGAAGGAGCT TGTCAACGCC 
851 AGCTACACGG ACAGCTACTA CAAGGGCCAG ACAGCACTGC ACATCGCCAT 
901 CGAGAGACGC AACATGGCCC TGGTCACCCT CCTGGTGGAG AACGGAGCAG 
951 ACGTCCAGGC TGCGGCCCAT GGGGACTTCT TTAAGAAAAC CAAAGGGCGG 
1001 CCTGGATTCT ACTTCGGTGA ACTGCCCCTG TCCCTGGCCG CGTGCACCAA 
1051 CCAGCTGGGC ATCGTGAAGT TCCTGCTGCA GAACTCCTGG CAGACGGCCG 
1101 ACATCAGCGC CAGGGACTCG GTGGGCAACA CGGTGCTGCA CGCCCTGGTG 
1151 GAGGTGGCCG ACAACACGGC CGACAACACG AAGTTTGTGA CGAGCATGTA 
1201 CAATGAGATT CTGATCCTGG GGGCCAAACT GCACCCGACG CTGAAGCTGG 
1251 AGGAGCTCAC CAACAAGAAG GGAATGACGC CGCTGGCTCT GGCAGCTGGG 
1301 ACCGGGAAGA TCGGGGTCTT GGCCTATATT CTCCAGCGGG AGATCCAGGA 
1351 GCCCGAGTGC AGGCACCTGT CCAGGAAGTT CACCGAGTGG GCCTACGGGC 
1401 CCGTGCACTC CTCGCTGTAC GACCTGTCCT GCATCGACAC CTGCGAGAAG 
1451 AACTCGGTGC TGGAGGTGAT CGCCTACAGC AGCAGCGAGA CCCCTAATCG 
1501 CCACGACATG CTCTTGGTGG AGCCGCTGAA CCGACTCCTG CAGGACAAGT 
1551 GGGACAGATT CGTCAAGCGC ATCTTCTACT TCAACTTCCT GGTCTACTGC 
1601 CTGTACATGA TCATCTTCAC CATGGCTGCC TACTACAGGC CCGTGGATGG 
1651 CTTGCCTCCC TTTAAGATGG AAAAAATTGG AGACTATTTC CGAGTTACTG 
1701 GAGAGATCCT GTCTGTGTTA GGAGGAGTCT ACTTCTTTTT CCGAGGGATT 
1751 CAGTATTTCC TGCAGAGGCG GCCGTCGATG AAGACCCTGT TTGTGGACAG 
1801 CTACAGTGAG ATGCTTTTCT TTCTGCAGTC ACTGTTCATG CTGGCCACCG 
1851 TGGTGCTGTA CTTCAGCCAC CTCAAGGAGT ATGTGGCTTC CATGGTATTC 
1901 TCCCTGGCCT TGGGCTGGAC CAACATGCTC TACTACACCC GCGGTTTCCA 
1951 GCAGATGGGC ATCTATGCCG TCATGATAGA GAAGATGATC CTGAGAGACC 
2001 TGTGCCGTTT CATGTTTGTC TACATCGTCT TCTTGTTCGG GTTTTCCACA 
2051 GCGGTGGTGA CGCTGATTGA AGACGGGAAG AATGACTCCC TGCCGTCTGA 
2101 GTCCACGTCG CACAGGTGGC GGGGGCCTGC CTGCAGGCCC CCCGATAGCT 
2151 CCTACAACAG CCTGTACTCC ACCTGCCTGG AGCTGTTCAA GTTCACCATC 
2201 GGCATGGGCG ACCTGGAGTT CACTGAGAAC TATGACTTCA AGGCTGTCTT 
2251 CATCATCCTG CTGCTGGCCT ATGTAATTCT CACCTACATC CTCCTGCTCA 
2301 ACATGCTCAT CGCCCTCATG GGTGAGACTG TCAACAAGAT CGCACAGGAG 
2351 AGCAAGAACA TCTGGAAGCT GCAGAGAGCC ATCACCATCC TGGACACGGA 
2401 GAAGAGCTTC CTTAAGTGCA TGAGGAAGGC CTTCCGCTCA GGCAAGCTGC 
2451 TGCAGGTGGG GTACACACCT GATGGCAAGG ACGACTACCG GTGGTGCTTC 
2501 AGGGTGGACG AGGTGAACTG GACCACCTGG AACACCAACG TGGGCATCAT 
2551 CAACGAAGAC CCGGGCAACT GTGAGGGCGT CAAGCGCACC CTGAGCTTCT 
2601 CCCTGCGGTC AAGCAGAGTT TCAGGCAGAC ACTGGAAGAA CTTTGCCCTG 
2651 GTCCCCCTTT TAAGAGAGGC AAGTGCTCGA GATAGGCAGT CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC 
2751 CTGAGGTCTT CAAGAGTCCT 
2801 GCAGACAGCA CTGTCAACAC 
2651 GGCTGCTGAG GGAACACCAG 
2901 CCTGCCCAGC ATGTTCCCAA 
2951 CTTGGAAGCA TGGGGAGTGA 
3001 AATCTCCTAA CAGACTTTCA 
3051 ATGGTCAGTC TCTACTGGGA 
3101 TTCTTTTTTT TGAGACAGAA 
3151 GTGGCACAAT TTTGGCTCCC 
3201 TCTCCTGCCT CGGCTTCCCA 
3251 TGTCTGGCTA ATTTTTTGTA 
3301 TTGGCCAGGC TGGTCTCGAA 
3351 GCCTCCCAAA GTGCTGGGAT 
3401 TTCTTTGATT TTATTCTTTT 
3451 TGTTGCCCAG GCTGGAGTGC 
3501 TGCCTCCCGG GTTCAAGCGA 
3551 GATTACAGGT GAGCACTACC 
3601 AGACGGGGTT TCACCATGTT 
3651 GGTGATCTGC CCGCCTTGGC 
3701 CCGCTGCGCT CGGCCTTCTT 
3751 TGAAGCCCAG GAAAACACCT 
3801 TGCAGAGGCC CTTCCTCTCT 
3851 GTGGTTTGGG GGTGTTGGTG 
3901 TCCCACTCCC AGCTCTGGCA 
3951 ATCCTTCCTT ACGATCAATC 
4001 TGCAGGTTAA AACTACAGAA 
4051 TTGAAAGATC TTCCATTTCT 
4101 CACATGCTTC CACTCCATCC 
4151 AT AC AT AT AA AAAAAAAAAA 



AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 
GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 
TGGGCCTTAG GAGACCCCGT TGCCACGGGG 
TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 
ATCTGTGCTG GACAAGCTGT GGGAAGCGTT 
TGTACATCCA ACCGTCACTG TCCCCAAGTG 
GGTTTTTACT CACTTTACTA AACAGTTTGG 
CATGTTAGGC CCTTGTTTTC TTTGATTTTA 
TTTCACTCTT CTCACCCAGG CTGGAATGCA 
TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 
AGTAGCTGGG ATTACAGGCA CGTGCCACCA 
TTTTTTTAAT AGATATGGGG TTTCGCCATG 
CTCCTGACCT CAGGTGATCC GCCCACCTCG 
TACAGGTGTG AGCCTCCACA CCTGGCTGTT 
TTTTTTTTCT GTGAGACAGA GTTTCACTCT 
AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 
TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 
ACGCCCGGCT AATTTTTGTA TTTTTAATAG 
GGCCAGGCTG GTCTCGAACT CTTGACCTCA 
CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 
TGATTTTATA TTATTAGGAG CAAAAGTAAA 
TTGGGAACAA ACTCTTCCTT TGATGGAAAA 
GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 
TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 
GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 
ACAGTCTCCA GAAGATCAGC TCAATTGCTG 
CCACATCCCA AAGGTACCTG GTAAGAATGT 
AGGAACCCCA GTCCTGCTTC TCCGCAATGG 
ATACTGGCAT CCTCAAATAA ACAGATATGT 
AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880; 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 



Peptide information for frame 2 



ORF from 272 bp to 2788 bp; peptide length: 839 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 

1 MKKWSSTDLG AAADPLQKDT CPDPLDGDPN SRPPPAKPQL STAKSRTRLF 
51 GKGDSEEAFP VDCPHEEGEL DSCPTITVSP VITIQRPGDG PTGARLLSQD 
101 SVAASTEKTL RLYDRRSIFE AVAQNNCQDL ESLLLFLQKS KKHLTDNEFK 
151 DPETGKTCLL KAMLNLHDGQ NTTIPLLLEI ARQTDSLKEL VNASYTDSYY 
201 KGQTALHIAI ERRKMALVTL LVENGADVQA AAHGDFFKKT KGRPGFYFGE 
251 LPLSLAACTN QLGIVKFLLQ NSWQTADISA RDSVGNTVLH ALVEVADNTA 
301 DNTKFVTSMY NEILILGAKL HPTLKLEELT NKKGMTPLAL AAGTGKIGVL 
351 AYILQREIQE PECRHLSRKF TEWAYGPVHS SLYDLSCIDT CEKNSVLEVI 
401 AYSSSETPNR HDMLLVEPLN RLLQDKWDRF VKRIFYFNFL VYCLYMIIFT 
451 MAAYYRFVDG LPPFKMEKIG DYFRVTGEIL SVLGGVYFFF RGIQYFLQRR 
501 PSMKTLFVDS YSEHLFFLQS LFMLATWLY FSHLKEYVAS MVFSLALGWT 
551 NMLYYTRGFQ QMGIYAVMIE KMILRDLCRF MFVYIVFLFG FSTAVVTLIE 
601 DGKNDSLPSE STSHRWRGPA CRPPDSSYNS LYSTCLELFK FTIGMGDLEF 
651 TENYDFKAVF IILLLAYVIL TYILLLNMLI ALMGETVNKI AQESKNIWKL 
701 QRAITILDTE KSFLKCMRKA FRSGKLLQVG YTPDGKDDYR WCFRVOEVNW 
751 TTWNTNVGII NEDPGNCEGV KRTLSFSLRS SRVSGRHWKN FALVPLLREA 
801 SARDRQSAQP EEVYLRQFSG SLKPEDAEVF KSPAASGEK 
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No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_20)c2, frame 2 

TREM8L:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds., N ■ 1, 
Score - 3760, P « 0 

TREMBLNEW:AB015231_1 product: "atretch-inhibitable nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), complete cds., N - 2, Score - 2090, P - 2e-219 



>TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length - 838 



Score - 3760 (564.1 bits), Expect - 0.0e+00, P - 0.0e+00 
Identities - 721/839 (85%), Positives - 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+++C DP 0 DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASP 60 

Query; 61 VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 120 

+DCP+EEG L SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRS I F+ 
Sbjct: 61 LDCPYEEGGLASCPI ITVSSVLTIQRPGOGPASVRPSSQDSVSAG-EKPPRLYDRRSI FD 119 

Query: 121 AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAHLNLHDGQNTTIPLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LTD + £ FKDPETGKTC LLKAMLNLH +GQN TI LLL+ + 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDV 179 

Query: 181 ARQT DSLKELVN AS YT DS Y Y KGQTALH I A I ERRNMALVT LLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDS Y YKGQTALH I AI ERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKT DS LKQ FVN AS YT DS Y Y KGGT A LH I A I E RRNMT L VT LL V ENG A D V QAAANG D F FKK T 239 

Query: 241 KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 300 

KGRPGFYFGELPLSLAACTNQL IVKFLLQNSWQ ADI S ARDSVGNTVLHALVEVADNT 
Sbjct: 240 KGRPGFYFGELPLSLAACTNQLAIVKFLLQNSWQPADISARDSVGNTVLHALVEVADNTV 299 

Query: 301 DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYI LQREI E 
Sbjct: 300 DHTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRIFYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEK-IGDYFRVTGEI 479 

RLLQDKWDRFVKRI FY FN F VYCLYMI IFT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRI FY FNFFVYCLYMI I FT AAA YYRPVEGLPPYKLKNTVGDYFRVTGEI 479 

Query: 480 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATWLYFSHLKEYVA 539 

LSV GGVYFFFRGIQYFLQRRPS+K+LFVDSYSE+LFF+QSLFML +VVLYFS KEYVA 
Sbjct: 480 LSVSGGVYFFFRGIQY FLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVA 539 

Query: 540 S MV FS LALG WTNML Y YT RG FQQMG I Y A VM IEKMILRDLCRFMFVYIVFLFGFS TAWT L I 599 

SMV FSLA+GWTNML Y YT RG FQQMG I YAVM I EKM I LRDLC RFMFVY+ V FL FG FSTAWTLI 
Sbjct: 540 S MV FS LAMG WT NM L Y YT RG FQQMG I Y A VM I E KM I LRDLC RFM FV YL V FL FG FS TAWT L I 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H* RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACKP-GNSYNSLYSTCLELFKFTIGMGDLEFTEKYDFKAV 658 

Query: 660 FIILLLAYVILTYILLLNMLIALHGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 
Sbjct: 659 FI ILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 718 

Query: 720 A FRSGKLLQVGYT POGKDDY RWC FRVDEVNWTTWNTN VG 1 I NEDPGNCEGVKRTLS FS LR 779 

A FRSGKLLQVG+T PDGKDDYRWC FRVDEVNWTTWNTN VG 1 1 NEDPGNCEGVKRTLS FS LR 
Sbjct: 719 AFRSGKLLQVG FT PDGKDDY RWC FRVDEVNWTTWNTNVG 1 1 NEDPGNCEGVKRTLS FSLR 778 

Query: 780 SSRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 779 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 



Pedant information for DKFZphtes3_20k2, frame 2 
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Report for DKFZphtes3_20)e2 . 2 



(LENGTH) 839 

(HW] 94950.75 

[pi] 6.90 

[HOMOL] TREMBL:AF029310_1 product: "vanilloid receptor subtype i"; Rattus norvegicus 
vanilloid receptor subtype 1 mRHA, complete cds . 0.0 

[FUNCAT] 99 unclassified proteins [S, cerevisiae, YIL112w) 4e-05 

[PIRKWl alternative splicing 3e-06 

[pirkw] peripheral membrane protein 3e-06 

[SUPFAM) ankyrin repeat homology 3e-06 

[SUPFAM] unassigned ankyrin repeat proteins 3e-06 

[PFAM] Ank repeat 

[KW] TRANSMEMBRANE 4 



SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 

PRD cccccccccccccccccceeeeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTONEFKDPETGKTCLLKAMLNLHDGQNTTI PLLLEI 

PRD hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDS LKELVNAS YTDS YYKGQT ALH I AI ERRNMALVT LLV EWG ADVQAAAHGDFFKKT 

PRD hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADHTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

S EQ DtTT K FVTSMYNE I L I LGAKLHPTLK LEELTNKKGMT PLALAAGTGK I GVLA Y I LQRE I QE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRH LSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 



SEQ RLLQDKWDRFVKR1 FY FN FL V YCL YM I 1 FTMAAYYRPVDGLPPFKMEKIGDYFRVTGEIL 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

HEM MMMMMMMMMMMMMMMMM 

SEQ SVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MVFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAWTLIE 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMMMMMMMMMM. 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSSYHSLYSTCLELFKFTIGMGDLEFTENYDFKAVF 

PRD cccccccccccccccccccccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ IILLLAYVILTYILLLKMLIALMGETVNKIAQESKHIWKLQRAITILDTEKSFLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MKMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIIHEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 



(No Prosite data available for DKFZphtea3_20k2 . 2) 
Pram for DKFZphtes3_20k2.2 
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HMM •GyTPLHIAARyNNvEMVrlLLQKGADIN* 

G+T+LHIA +++N+ +V LL+++GAD+ 
Query 202 GGTALHIAIERRKMALVTLLVENGADVQ 229 
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DKFZphtes3_2013 



group: transmembrane protein 

DKFZphtes3_2013 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor. 

The novel protein contains one transmembrane region. 

Ho informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 



similarity to IL-17 receptor 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 2106 bp 

Poly A stretch at pos. 2345, no polyadenylation signal found 



1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 

51 AGCCAGCAGA AACAGTGGCC TGTACAACAT CACCTTCAAA TATGACAATT 

101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 

151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 

201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 

251 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 

301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 

351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 

401 TTGTCCCTTT TCCTTCCATT AAAAACGAAA GCAATTACCA CCCTTTCTTC 

4 51 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 

501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 

551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 

601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 

651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 

701 TTTCTCCAGG G GAT TAT AT A ATTGAGCTGG TGGATGACAC TAACACAACA 

751 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 

801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 

851 TCGCGACGCT CTTCACTGTG ATGTGCCGCA AGAAGCAACA AGAAAATATA 

901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 

951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 

1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 

1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 

1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 

1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 

1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 

1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 

1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 

1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 

1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 

1451 CCCGAGACCA CGGCCTCCAG GAGCCCCGGC AGCACACGCG ACAGGGCAGC 

1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 

1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 

1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 

1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 

1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 

1751 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 

1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 

1B51 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 

1901 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 

1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 

2001 GTCCTCCTCT TCAGGCCTGG GTGAGGAGGA ACCTCCTGCC CTTCCTTCCA 

2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 

2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACAAAACG AAAGAGTCTA 

2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 

2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 

2251 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 

2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 

2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2401 AAAAAA 



BLAST Results 



NO BLAST result 
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Medline entries 



Ho Medline entry 



Peptide information for frame 1 



ORF from 346 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 

1 MESQPFLNMK FETDYFVKVV PFPSIKNESN YHPFFFRTRA CDLLLQPDNL 
51 ACKPFWKPRN LNISQHGSDM QVSFDHAPHN FGFRFFYLHY KLKHEGPFKR 
101 KTCKQEQTTE MTSCLLQNVS PGDYIIELVD DTNTTRKVMH YALKPVHSPW 
151 AGPIRAVAIT VPLWISAFA TLFTVMCRKK QQENIYSHLD EESSESSTYT 
201 AALPRERLRP RPKVFLCYSS KDGQNHMNVV QCFAYFLQDF CGCEVALDLW 
251 EDFSLCREGO REWVIQKIHE SQFIIVVCSK GMKYFVDKKN YKKKGGGRGS 
301 GKGELFLVAV SA1AEKLRQA KQSSSAALSK FIAVYFDYSC EGDVPGILDL 
351 STKYRLMDNL PQLCSHLHSR DHGLQEPGQH TRQGSRRNYF RSKSGRSLYV 
401 AICNMHQFID EEPDWFEKQF VPFHP^PLRY REPVLEKFDS GLVLNDVMCK 
451 PGPESDFCLK VEAAVLGATG PADSQHESQH GGLDQDGEAR PALDGSAALQ 
501 PLLHTVKAGS PSDMPRDSGI YDSSVPSSEL SLPLMEGLST DQTETSSLTE 
551 SVSSSSGLGE EEPPALPSKL LSSGSCKADL GCRSYTDELH AVAPL 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2013, frame 1 

TPXMBL:U58917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds., N - 1, Score - 215, P - 4.7e-14 

TREMBL:MM31993 1 product: "interleukin 17 receptor"; Mus musculus 
interleulcin 17~receptor mRNA, complete cds., N • 2, Score - 152, P - 
l.le-13 



>TR£MBL:U58917_1 product: "IL-17 receptor", 
mRNA, complete cds. 

Length - 866 



Homo sapiens IL-17 receptor 



Score - 215 (32,3 bits), Expect - 4.7e-14, P - 4.7e-14 
Identities - 85/284 (29%), Positives - 131/284 (46%) 



Query: 


213 


Sbjct: 


379 


Query: 


269 


Sbjct: 


438 


Query: 


325 


Sbjct: 


4 98 


Query: 


384 


Sbjct: 


551 


Query: 


435 


Sbjct: 


611 



IIV+CS+G + 



++ YF + SC+GDVP + 



S NY RS GR L A+ 



PS CL ++ V G G A 



+D + +PG+ 



-LQPRGQPAP 662 



Pedant information for DKFZphtes3_2013, frame 1 



Report for DKFZphtes3_2013. 1 
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[LENGTH] 595 

[MW] 66847.05 

[pi] 6.27 

[HOMOL] TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus interleukin 
17 receptor mRNA, complete cds. 2e-14 

[BLOCKS] BL00740A MAM domain proteins 

[BLOCKS] BL01224B N-acetyl-gamma-glutamyl-phosphate reductase proteins 

[KW] TRANSMEMBRANE 1 

[KW] LOW_C0MPLEXITY 13.61 % 



SEQ MESQPFLNMKFETDYFVKVVPFPSIKNESNYHPFFFRTRACDLLLQPDNLACKPFWKPRN 

SEG 

PRD ccccccccccccccceeeeeccccccccccceeeeeeceeeeeeeccccccccccccccc 

MEM 

SEQ LNISQHGSDMQVSFDHAPHNFGFRFFYLHYKLKHEGPFKRKTCKQEQTTEMTSCLLQNVS 

SEG 

PRD eeeecccccceeeecccccccceeeeeehhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ PGDYIIELVDDTNTTRKVMHYALKPVHSPWAGPIRAVAITVPLVVISAFATLFTVMCRKK 

SEG 

PRD ccceeeeeeccccccccccccccccccccccccceeeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ QQENI YSHLDEESSESSTYTAALPRERLRPRPKVFLCYSSKDGQNHMNVVQCFAYFLQDF 

SEG xxxxxxx xxxxxxxxxx 

PRD hhhhhhhhhcccccccceeeeccccccccccceeeeeeecccccchhhhhhhhhhhhhhc 

MEM 

SEQ CGCEVALDLWEDFSLCREGQREWVIQKIHESQFIIVVCSKGMKYFVDKKNYKHKGGGRGS 

SEG xxxxxxxxx 

PRD ccchhhhhhhhccccccccchhhhhhhhhhheeeeeeeeccceeeeeccccccccccccc 

MEM 

SEQ GKGELFLVAVSAIAEKLRQAKQSSSAALSKFIAVYFDYSCEGDVPGILDLSTKYRLMDNL 

SEG xxx xxxxxxxxxxxxxxx 

PRD ccceeeeehhhhhhhhhhhhhhcchhhhhhhheeeeccccccccccccccchhhhhhccc 

MEM 

SEQ PQLCSHLHSRDHGLQEPGQHTRQGSRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQF 

SEG 

PRD cchhhhhhcccccccccccccccccceeeeccccccceeeeeeceeeecccccceeeeee 

MEM 

SEQ VPFHPPPLRYREPVLEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQH 

SEG 

PRD eecccccccccceeeeeccccceeeeecccccccccchhhhhhhhhhccccccccccccc 

MEM 

SEQ GGLDQDGEARPALDGSAALQPLLHTVKAGSPSDMPRDSGIYDSSVPSSELSLPLMEGLST 

SEG xxxxxxxxxxxxxxxxx . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccchh 

MEM 

SEQ DQTETSSLTESVSSSSGLGEEEPPALPSKLLSSGSCKADLGCRSYTDELHAVAPL 

SEG . . xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhheeecccccccccccccccceeeccccceeeccccccccceeeeccc 

MEM 



(No Prosite data available for DKFZphtes3_2013 . 1) 
(No Pfam data available for DKFZphtes3_2013 . 1) 
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DKFZphtes3_20ml8 



group: nucleic acid management 

DKFZphtes3_20ml8 encodes a novel 132 amino acid protein with similarity to the S. cerevisiae 
mitochondrial carrier protein RIM2 . 

The novel protein contains a leucine zipper and a Prosite mitochondrial energy transfer 
proteins signature. It is member of a family of substrate carrier proteins which are found in 
the inner mitochondrial membrane and are involved in energy transfer. The RIM2/MRS12 gene 
encodes a predicted protein of 377 amino acids that is essential for mitochondrial DNA 
metabolism and proper cell growth. Inactivation of this gene causes the total loss of 
mitochondrial DNA and, compared to wild-type rhoo controls, a slow-growth phenotype on media 
containing glucose. The novel protein seems to be the human orthologue of this protein. 

The new protein can find application in modulation of mitochondrial DNA replication and 
maintenance. 



similarity to carrier protein RIM2 

Sequenced by MediGenomix 

Locus : unknown 

Insert length; 3572 bp 

Poly A stretch at pos. 3530, polyadenylation signal at pos . 3510 



1 GCCGCGGGGA GGGCTGTGCC GGTTGCTTTC TGCAGCCGCA TCTCGGCCAG 
51 CTCTCCTCGC CGTCCCCGGG GCGCTGTGCG TCTCCAGTCC GGGACCGAAG 
101 CCGCCTGCCG TAGCGGGCGG CCAGATCCGC GTCCCGCCTC AGCGGCCGGA 
151 GGACATGCGG GAGAGAGAAT GAGCCAGAGG GACACGCTGG TGCATCTGTT 
201 TGCCGGAGGA TGTGGTGGTA CAGTGGGAGC TATTCTGACA TGTCCACTGG 
251 AAGTTGTAAA AACACGACTG CAGTCATCTT CTGTGACGCT TTATATTTCT 
301 GAAGTTCAGC TGAACACCAT GGCTGGAGCC AGTGTCAACC GAGTAGTGTC 
351 TCCCGGACCT CTTCATTGCC TAAAGGTGAT CTTGGAAAAA GAAGGGCCTC 
401 GTTCCTTGTT TAGAGGACTA GGCCCCAATT TAGTGGGGGT AGCCCCTTCC 
451 AGAGCAATAT ACTTTGCTGC TTATTCAAAC TGCAAGGAAA AGTTGAATGA 
501 TGTATTTGAT CCTGATTCTA CCCAAGTACA TATGATTTCA GCTGCAATGG 
551 CAGGTATGAA TGTATAATAT TAAAAAAAAA AAAAACTTTC TGAAACCTAG 
601 AGGCTTAATA TTGAATTATA AGTTTGTAGT GAAAAGTTGA TGATTAATGT 
651 GCTTTTCATT GATTAGATGA TTTTTACGTT TATCGATATA AACCAAATTA 
701 GGTATATGTA AAATCTGTCA TCAGTTGACA TTTTTGTAGT CAGGAGTTTA 
751 CATGCTAGGG TACAAGTAAT ATATTTATAT TGCCTTGTGT AGTCCACTGA 
801 ATGTTTAGTG ATCATTGTTA ACAGTTTTAA GAATCCAACC ATAATTACAC 
8 51 TATAAATAAG TTATGGAGCT GTAATTTACT CTTCTCTCCT CAATTTCTGT 
901 TAGTGCCTTT TCCCTTTTTG CTGCATGTTT TGGCTTCTGT CTGAAATGTG 
951 TCGGCAATTC TTGGTAAAGT ATTCATTTTG TCCTGTGCTC AAATGCTGAA 
1001 ATTTTTGTGA GTGATGTATT ATTATTGACA ATTCAGTTAC TATGTGTATT 
1051 TTTTAAAATT GTTTATTATT CTACATAATT CACACTAGAC AGCACCTGAA 
1101 ATTTAGACAC TGGCTATGTG TACATGCTTA CTATAGAAAT GTTTCCAGGA 
1151 ACTCTCTGTT TCTGTCATCA CTGATAAGTA TATATGATTC TGAATTAAAA 
1201 TAACTAGTTT TAGGTCTTTA CCCTGCCATA AAGATAAACA GTTGGTTTGA 
12 51 CCAATCTGGT TCTGGAATCA TTTGCTGCTA TGCATGTTAG ACAAAGCCAC 
1301 GAACTTTGAT TTTCCATTGA AAATTCTCCC TAATATCTGA GATTTATTGT 
1351 ATATTTACTC ATATCTCACA TTTTCAAATT ATGCTGTAAC TTTATAAACT 
14 01 GTAGCTGCTT TCATCAGCTA TTGATCAATA AATTGAATGT CAATTATGTG 
14 51 CTTAATAATG AGTGCCTTAA ACTGTTAAAC ACTTTTGGTT TAGAAATAAA 
1501 GTGAATCAAT TTGACCTATA TACTTCATGA AGTAAGTAAG TTTGAAATAC 
1551 AAATTTCTGA AAGGTCAATA GCCCTTATCG TATTACAAAT TGTTTTTAAG 
1601 GCTTTTTGTA TTTATTAATT GTCAGTTGAT TCACTGAAGC TTTAAAACTG 
1651 GAAGGGACAA TCCAAAGGTC AAAAGAGTGA AATACAATCA TTTACCAATA 
1701 AGGAAACCTT GGGCAAATTA TGTAATTTAT GTGAACCTCT CTTAGCTTAC 
1751 CCATGGAATG AGTCAAGTGG TCTACATAGA TTTGGATTTT GAGAATTAGT 
1801 TCTTTCATTT AGTGTTATAG AGATTATCTT GTTACAACTA GAATTATTTT 
1851 TAATGTAATT TTTACAGATG TTGAATATTA GTAGATAGGA TTTTTCCCCT 
1901 ACGAATTTGG ATGTAAGGTA AAGGTTGGTG GCCAGTGACA AACCTTATAA 
1951 CCACTTTATC AGGTTCTTTA AAAATATATT TGTGAATTAC CAGTGATTAT 
2001 GTTTTTGGCT TATAACCTCA GATAATTATA AAGAAATGTT AATCTTATTT 
2051 GAAAGAATTG GAATCTAGAA AGTTAGATGA GCAGTCATTT TATATTGATA 
2101 TTTGTTATAT CAGTATAGCA AATGCAGAGG TTCAGAATAT CTTTATTTCC 
2151 ACTGGAACAT CTTATTTCAT TAGAGTATCT CATCAGAATT TATTACTGTA 
2201 TTTGTATCAC ATTGCAAAGA ATTTCAGTAG AATTGTCAGT TTGCACTTTT 
22 51 TTCTCAAATG TGTACAAATG TTAACATATA GTTCATTTTT ATCTGTACAT 
2301 TGATGCCATT TCCCAACTTG AATTCCTCAA GTTTTGGTAA ACTTACAATC 
2351 TCATACTTGT TCAGAGGTTA TTGCACTGTA CACTTACTGT GTAGAAAATA 
2401 CTGTTTGAAT TTGTTTGCAG TTACATTGTT CTGAGAACTG TGCTCTCAGA 
24 51 GCTTCTGTGC ACTATTCATG AGCATTAACA CTTAGCCTTG CAGTTTTATA 
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2501 CATAACTATA TGGTTAGTAA AACTGAATGG 
2551 GTAGGCTTTT GCCCCCTTTG TTCTTGAAAT 
2601 GGGTTTTTTT TAGGATTATT TTTATAGGTC 
2651 GTATGAAGTA CTTAAAGATA GTTCTGTGAA 
2701 TTCAAGGGAA AAAATGCTAA CCTTGTCACT 
2751 AAAATAAACC ATTAATGATA CTGCCTGCAA 
2801 CACACACATT AAGGATTTAT AAGGCACTGT 
2851 GACCTCTCAA TTCATTTTCA TTTTGCATTT 
2901 TAATTTAGAT AATAAAAATT TATTTTATTA 
2951 TGGGTCTTTT TATTTGTTGT AGTGCATACT 
3001 AAAGTTGAGC TATAAATTTT CATGCATTAA 
3051 GATATTTAAT CAGATTAAAT AATGTTGACT 
3101 TTTTTTCTCC TACACATGAC CTTTGACAGA 
3151 GAGGGTATCT GTTTTGTTGC CTGTATATTT 
3201 TTCCTTTGTA TACACCTAGG CACAGATGTA 
3251 TTACTTCTTT CTTTATACTA ATTCTCAATT 
3301 ATGTATATAC TTTTATATAG AACATTATAA 
3351 AATTTTAATT GGATTATGTA TTCATACAGT 
3401 CTAATAATGT AATCATTGAA TGTTTCCTAC 
3451 GCTCACAGCA TACAGTTATT TTTCAATTTA 
3501 ATTTCATTAT AATAAAGGCT TTTACTCATT 
3551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
95198680: 

Overexpression of a novel member of the mitochondrial carrier family rescues defects in both 
DNA and RNA metabolism in yeast mitochondria. 



Peptide information for frame 1 



ORF from 169 bp to 564 bp; peptide length: 132 
Category: similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: LEUCINE_ZIPPER (27-49) 
MITOCH CARRIER (26-36) 



1 MSQRDTLVHL FAGGCGGTVG AILTCPLEVV KTRLQSSSVT LYISEVQLNT 
51 MAGASVNRVV SPGPLHCLKV ILEKEGPRSL FRGLGPNLVG VAPSRAIYFA 
101 AYSNCKEKLN DVFDPDSTQV HMISAAMAGM NV 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20ml8, frame 1 

PIR:S44092 probable carrier protein c2 - Caenorhabditis elegans, N = 2, 
Score = 147, P = 1.5e-19 

PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) , N « 1, Score = 230, P = 6.2e-19 



>PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) 
Length =» 377 

HSPs: 

Score = 230 (34.5 bits), Expect = 6.2e-19, P = 6.2e-19 
Identities - 55/133 (41%), Positives = 80/133 (60%) 

Query: 8 VHLFAGGCGGTVGAILTCPLEVVKTRLQSSS-VTLYISEVQLNTMAGA SVNRVVSP 62 

VH AGG GG GA++TCP ++VKTRLQS + Y S+ +N G+ S+N V+ 
Sbjct: 54 VH FVAGG I GGMAG A VVTC PFDLVKTRLQSDI FL KA Y K SQA-VNISKGSTRPKSINYVIQA 112 



TCCAATGCAG 
AATCTAGACC 
TAAATATGAA 
AAATCATTTT 
TTACTACACA 
GATTTTAACA 
ACGTAATTTT 
TATCCATATG 
AAAGGACAGT 
ATAAGAATTT 
AAATTTGTTT 
CTTAATATTT 
CTAAGTATAT 
TGTTTAAATT 
TGCAAAAAAA 
TTTAAAAGAT 
ATGTAAAGGA 
TATTCTCAAT 
ATACGTAGTG 
TGTTTTTCTA 
AAATACAAAA 



ACTCATTAAA 
AGATTACTCG 
TGATTTGGGG 
CAGCTGTCTA 
AAACCACACT 
CACCAGATAG 
TATTCCAAGT 
AACTCATGTT 
TTATTTAAAG 
GTAAGCCTCT 
CAGTTGTGAG 
TGCCTGCCTT 
CTCAGCTATT 
AACTTGTATA 
ATTTGTTAAA 
TTTATCTGGC 
AATGAATTCT 
TTTTAAAATA 
GGTTTTATTT 
TTAGACTTAA 
AAAAAAAAAA 
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Query; 63 GP LHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAI YFAAYSNCKEKLNDVFD--P 115 

G L + + ++EG RSLF+GLGPNLVGV P+R+I FY K+ F+ 

Sbjct: 113 GTHFKETLGIIGNVYKQEGFRSLFKGLGPNLVGVIPARSINFFTYGTTKDMYAKAFNNGQ 172 

Query: 116 DSTQVHMI SAAMAG 129 

++ +H+++AA AG 
Sbjct: 173 ETPMIHLMAAATAG 186 

Score =77 (11.6 bits), Expect « l.le+00, P » 6.8e-01 
Identities - 25/88 (28%), Positives - 39/88 (44%) 

Query: 3 QRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSvTLYISEVQLNTMAGASVNRVVSP 62 

Q ++HL A G A T P+ ++KTR VQL+ SV + + 

Sbjct: 172 QETPMIHLMAAATAGWATATATNPIWLIKTR VQLDKAGKTSVRQYKNS 219 

Query: 63 GPLHCLKVILEKEGPRSLFRGLGPNLVG 90 

CLK ++ EG L++GL + +G 
Sbjct: 220 WD--CLKSVIRNEGFTGLYKGLSASYLG 245 

Score = 71 (10.7 bits), Expect - 6.6e+00, P - 1.0e+00 
Identities = 28/91 (30%), Positives » 45/91 (49%) 

Query: 12 AGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSPGPLHCLKVI 71 

+ G V +1 T P EVV+TRL+ + +NG R+G+ KVI 

Sbjct: 294 SAGLAKFVASIATYPHEVVRTRLRQTP KEN— G— KRKYT-GLVQSFKVI 338 

Query; 72 LEKEGPRSLFRGLGPNLVGVAPSRAI YFAAY 102 

+++EG S++ GL P+L+ P+ I F + 
Sbjct: 339 IKEEGLFSMYSGLTPHLMRTVPNSI IMFGTW 369 

Pedant information for DKFZphtes3_20ral8, frame 1 



Report for DKFZphtes3_20ml8 . 1 

[ LENGTH ) 132 

[MW] 13993.36 

[plj 8.42 

[HOMOL] PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast (Saccharomyces 
cerevisiae) 7e-19 

[FUNCAT] 07.16 purine and pyrimidine transporters [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 08.04 mitochondrial transport [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 30.16 mitochondrial organization (S. cerevisiae, YBRl92w] 3e-20 

[FUNCAT] 02.13 respiration [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.99 other transport facilitators [S. cerevisiae, YEL006w] le-09 

[FUNCAT] 01.07.10 transport of vitamins, cofactors, and prosthetic groups (S. 
cerevisiae, YIL006w] 3e-09 

[FUNCAT] 07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w) 

2e-08 

[FUNCAT] 01.03.19 nucleotide transport [S. cerevisiae, YPROllc] 3e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 4e-08 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 

2e-07 

[ FUNCAT ] 01.01.07 amino-acid transport [S. cerevisiae, YORl30c] 5e-05 

[FUNCAT] 07.10 amino-acid transporters [S. cerevisiae, YOR130c] 5e-05 

[ FUNCAT ] 01.04.07 phosphate transport [S. cerevisiae, YJR077c] 7e-05 

(FUNCAT) 13.04 homeostasis of other ions [S. cerevisiae, YJR077c] 7e-05 

(BLOCKS) BL00215B Mitochondrial energy transfer proteins 

[BLOCKS) BL00215A Mitochondrial energy transfer proteins 

[PIRKWl duplication 6e-09 

[ PIRKW) transmembrane protein 6e-09 

[PIRKWJ mitochondrial inner membrane 4e-07 

[PIRKWJ transport protein 5e-06 

[PIRKW] mitochondrion 7e-08 

[PIRKW] chloroplast 3e-08 

[SUPFAM] Btl protein 3e-08 

(SUPFAM] ADP, ATP carrier protein repeat homology 4e-09 

[SUPFAM] Caenorhabditis probable carrier protein c2 4e-09 

[SUPFAM1 probable carrier protein YPR021c 6e-09 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MITOCH_CARRIER 1 

[PFAMJ Mitochondrial carrier proteins 

[KW] Alpha_Beta 



SEQ MSQRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVV 
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PRD cccccceeeecccccccceeeeeecchhhhhhhhhhhccccccccccccccccccccccc 

SEQ SPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLNDVFDPDSTQV 

PRD cccchhhhhhhhhhcccceeeeccccceeeecccceeeeeehhhhhhhhhcccccccccc 

SEQ HMISAAMAGMNV 

PRD chhhhhhhcccc 



Prosite for DKFZphtes3_20ml8 . 1 

PS00029 27->49 LEUCINE_ZIPPER PDOC00029 

PS00215 26->36 MITOCH_CARRIER PDOC00189 



Pfam for DKFZphtes3_20ml8 . 1 



HMM_NAME Mitochondrial carrier proteins 

HMM *pFwkdFLAGGIAGrnMeHTvMFPIDtIKTRMQlQgEMpM. .ahpR 

+++++++AGG +G + +++++P++++KTR+Q++ ++ + ++ 
Query 5 DTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSS-SVTLYISEVQLNTMA 52 

HMM YkGMIdCFRwIwkNEGWRGLWRGLgANvIRYI PqWalRFGFY 

G+++C++ I+++EG+R+L+RGLG+N+++++P +AI+F+ Y 
Query 53 GASVNRVVSPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAI YFAAY 102 

HMM EFMKeMFiDyfgeddnyWmWFwmnYMaGs* 

+KE ++D F++ D++++++ + +MAG+ 
Query 103 SNCKEKLNDVFDP-DSTQVHMISAAMAGM 130 
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group: signal transduction 

DKFZphtes3_21d4 encodes a novel 4 64 amino acid putative GTP exchanging factor related to RCC1. 

RCC1 (regulator of chromosome condensation) is a eukaryotic protein which binds to chromatin 
and interacts with ran, a nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP 
with GTP, acting as a guanine-nucleotide dissociation stimulator. 

The new protein can find application in the regulation of gene expression by activition of 
nuclear GTP-binding proteins. The X-linked retinitis pigmentosa is a result of a defect GTPase 
regulator, which contains a RCCl-type repeat. 



similarity to RCCl-like G exchanging factor RLG 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map=*"20" 
Insert length: 2321 bp 

Poly A stretch at pos. 2293, polyadenylation signal at pos . 2262 



1 GGGTCACGCA AGATGGCGGC GCCCAGAGGC TGCTGAGGCG CGGAACGGAG 
51 GATGGCGCTG GTGGCGTTGG TGGCTGGGGC TCGGCTGGGG CGGCGGCTGA 
101 GCGGGCCGGG GCTGGGGCGA GGGCACTGGA CGGCGGCCAG GCGCTCCCGG 
151 AGCCGGCGCG AAGCGGCAGA AGCCGAGGCG GAGGTGCCCG TGGTCCAGTA 
201 CGTGGGCGAG CGCGCTGCCC GCGCCGATCG CGTCTTCGTG TGGGGCTTCA 
251 GCTTCTCGGG GGCGCTGGGC GTGCCTTCCT TTGTGGTGCC CAGCTCCGGG 
301 CCCGGGCCCC GCGCCGGCGC CCGACCGCGC CGCAGGATCC AGCCCGTGCC 
351 CTATCGCCTG GAGCTGGACC AAAAGATTTC ATCTGCTGCT TGCGGCTATG 
401 GATTCACACT GCTGTCCTCT AAGACTGCGG ATGTTACGAA AGTCTGGGGG 
451 ATGGGACTCA ACAAAGATTC TCAGCTTGGA TTTCACAGGA GCCGGAAAGA 
501 TAAAACGAGG GGCTACGAGT ATGTGTTGGA GCCCTCACCC GTCTCCCTGC 
551 CTCTGGACAG ACCTCAGGAG ACACGGGTGC TGCAGGTCTC CTGCGGCCGA 
601 GCTCACTCTC TTGTGTTGAC TGACAGGGAA GGAGTCTTCA GCATGGGAAA 
651 CAATTCTTAT GGGCAATGTG GAAGAAAGGT GGTCGAAAAT GAAATTTACA 
701 GTGAAAGTCA CAGAGTCCAC AGGATGCAGG ACTTCGATGG CCAGGTGGTC 
751 CAGGTCGCCT GTGGTCAGGA TCATAGTCTG TTCCTGACGG ATAAAGGAGA 
801 AGTCTATTCT TGTGGATGGG GTGCTGATGG GCAAACAGGT CTGGGTCACT 
851 ACAATATCAC CAGCTCGCCC ACCAAGCTGG GTGGAGACCT GGCGGGAGTG 
901 AACGTTATCC AAGTTGCCAC CTACGGTGAT TGCTGCCTGG CCGTGTCCGC 
951 CGACGGAGGA CTTTTTGGTT GGGGAAACTC GGAGTACCTG CAGCTGGCCT 
1001 CTGTCACTGA CTCCACACAG GTGAATGTGC CCCGCTGCTT ACACTTCTCA 
1051 GGAGTGGGGA AGGTGCGACA GGCTGCATGC GGTGGCACGG GCTGTGCAGT 
1101 GTTAAACGGA GAAGGACATG TTTTTGTCTG GGGCTATGGA ATTCTTGGGA 
1151 AAGGTCCAAA CCTAGTGGAA AGTGCCGTCC CTGAAATGAT TCCACCCACT 
1201 CTCTTTGGCT TGACGGAGTT CAACCCAGAA ATCCAGGTTT CCCGCATCCG 
1251 ATGTGGACTC AGCCACTTTG CTGCACTGAC CAACAAAGGA GAGCTGTTTG 
1301 TATGGGGCAA GAACATCCGA GGGTGCCTGG GAATCGGTCG CCTGGAGGAC 
1351 CAGTATTTCC CATGGAGGGT GACGATGCCT GGGGAGCCTG TGGACGTGGC 
1401 ATGTGGCGTG GACCACATGG TGACCCTGGC CAAGTCATTC ATCTAAACCT 
1451 CCCTCACCTG CTTGGGCGGC CCCGTCCCGG GAACCACTGG CACTCCTTGG 
1501 CAGAGGCCAG CGCGTGGCCA GCCCCCCGGG GTTCTTGGAT GGTGGTGGCG 
1551 GAGGACCCTG CGTGCAGTGT GACGCTCTGT CCTGAATCCC TTAGCGGGTA 
1601 CCTACCAGGA GGATCAGGGC AAGGTCCCTC TCCAGCTGCA GGTGAGGCCT 
1651 GCGGAACTCA GCTTGGATGG CAGCCTTTGG TGGGCCGCTG TGGCCCGCAC 
1701 GTCTCTGTTC TCTCCAAGTA ACATGCGACG GTGTCTGGTG TCACGTCTCG 
1751 CCTGAGAAGC CCGTCTTAGG AAAGCTTAGC TTGAACACAG TGCTCGGGAG 
1801 GTTTCTGCTC TGTCTGTCAT GGCAGTCTCT TGGTTTGTGT CTGGCCAAGG 
1851 CCATGCGTGT GCCTCGGACC GAGCCCCAGC TTAGGCGAGG GAGTCAGGCT 
1901 GGCTTCGGCC CTCGGTTTTC ATTCAGGCCA CCCTGCTCAT GGCCCTTCCT 
1951 GGCCGCCTGC CACACCGCAA GCTCGCTGGG GGGACACTAG AAGCACCGTG 
2001 GCCTGGGATT CCATCTGGAG CTGTCCGCAG GCACCAGCCC CAGCCTCCCA 
2051 CCACGCTCAC TGCCTGGCTT GGAAAAGTTA AGAAGCCCCT CAGGAAGAGA 
2101 ATCGAGGCTA AGTTCCTCTG CGCCGAGGGC CCCGAGCATA TCCGCCAAGG 
2151 CTCAGCTGCA GTGCCAGGCG GAGGAGGAAG ATCCAGAAAT TGTGAACAAT 
2201 GTTTGATTTA GTAGCGTGAC TTGCCTTTCC CTTTAAAAAC ATCTTTTACA 
2251 AATCTGTCTT GGAATAAAGT CTATTTTCTG CCTTTTGGTT TTTAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 
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Entry HS203358 from database EMBL: 
human STS SHGC-31781. 
Score = 1748, P =» l.le-72, identities - 376/394 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 52 bp to 1443 bp; peptide length: 4 64 
Category: similarity to known protein 



1 MALVALVAGA RLGRRLSGPG LGRGHWTAAR RSRSRREAAE AEAEVPVVQY 

51 VGERAARADR VFVWGFSFSG ALGVPSFVVP SSGPGPRAGA RPRRRIQPVP 

101 YRLELDQKIS SAACGYGFTL LSSKTADVTK VWGMGLNKDS QLGFHRSRKD 

151 KTRGYEYVLE PSPVSLPLDR PQETRVLQVS CGRAHSLVLT DREGVFSMGN 

201 NSYGQCGRKV VENEIYSESH RVHRMQDFDG QVVQVACGQD HSLFLTDKGE 

251 VYSCGWGADG QTGLGHYNIT SSPTKLGGDL AGVNVIQVAT YGDCCLAVSA 

301 DGGLFGWGNS EYLQLASVTD STQVNVPRCL HFSGVGKVRQ AACGGTGCAV 

351 LNGEGHVFVW GYGILGKGPN LVESAVPEMI PPTLFGLTEF NPEIQVSRIR 

401 CGLSHFAALT NKGELFVWGK NIRGCLGIGR LEDQYFPWRV TMPGEPVDVA 

4 51 CGVDHMVTLA KSFI 



BLASTP hits 



Entry CEW09G3_5 from database TREMBLNEW: 

gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 

Score - 395, P » 9.3e-37, identities - 111/330, positives « 165/330 

Entry Y032_HUMAN from database SWISSPROT: 
HYPOTHETICAL PROTEIN KIAA0032 . 

Score = 309, P - l.Oe-24, identities - 96/308, positives = 143/308 

Entry B38919 from database PIR: 
hypothetical protein 2 - human (fragment) 

Score = 309, P = 1.0e-24, identities = 96/308, positives = 143/308 
Entry AF060219_1 from database TREMBLNEW: 

product: "RCCl-like G exchanging factor RLG"; Homo sapiens RCCl-like G 
exchanging factor RLG mRNA, complete cds . 

Score = 273, P = 4.0e-21, identities = 84/262, positives «* 124/262 



Entry S71752 from database PIR: 

giant protein p619 - human 

Score = 282, P = l.le-19, identities 



86/287, positives - 144/287 



Alert BLASTP hits for DKF2phtes3_21d4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_21d4, frame 1 



Report for DKF2phtes3_21d4 . 1 



[ LENGTH J 
[MW] 
tpl] 
[HOMOL] 

[FUNCAT] 
[FUNCAT] 

[S. 

[FUNCAT] 

[FUNCAT] 

cerevisiae, 

(FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 



464 

49997.08 
8. 74 

TREMBL:CEW09G3_5 gene: 



,, W09G3.3 , ^■ Caenorhabditis elegans cosmid W09G3 5e-34 



04.07 rna transport [S. cerevisiae, YGL097w) 2e-09 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YGL097w] 2e-09 

08.01 nuclear transport * [S. cerevisiae, YGL097w] 2e-09 

04.05.05 mrna processing (5* -end, 3 * -end processing and mrna degradation) [S. 
YGL097w) 2e-09 

04.01.04 rrna processing [S. cerevisiae, YGL097w] 2e-09 

04.03.03 trna processing [S. cerevisiae, YGL097w] 2e-09 

30.03 organization of cytoplasm [S. cerevisiae, YGL097w] 2e-09 
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roi nr"trc 1 
[ J 
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[BLOCKS] 


BL00625A Regulator of chromosome condensation (RCC1) 


proteins 


[ PIRKW] 


DiocKea amino end je-io 




[PIRKW] 


nucleus 3e-16 




[PIRKW] 


duplication 4e-08 




[PIRKW] 


tandem repeat 3e-16 




(PIRKW] 


DNA binding 3e-16 




[PIRKW] 


mitosis 3e-16 




[PIRKW] 


leucine zipper 3e-2 1 




[SUPFAM] 


pheromone response pathway component SRMl 4e-08 




[SUPFAM] 


WD repeat homology 3e-21 




[PROSITE] 


MYRISTYL 7 




[PROSITE] 


RCC1 2 2 




[PROSITE] 


AMIDATION 2 




[PROSITE] 


CAMP PHOSPHO SITE 1 




[PROSITE] 


CK2 PHOSPHO SITE 5 




[PROSITE] 


TYR PHOSPHO SITE 2 




( PROSITE] 


GLYCOSAMINOGLYCAN 3 




[PROSITE] 


PKC PHOSPHO SITE 7 




[PROSITE] 


AS N_GL YCOS YL AT I ON 2 




[PFAM] 


Regulator of chromosome condensation (RCC1) 




[KW] 


All Beta 




[KW] 


LOW COMPLEXITY 13.58 % 





SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MALVALVAGARLGRRLSGPGLGRGHWTAARRSRSRREAAEAEAEVPVVQYVGERAARADR 

.xxxxxxxxxxxxxxxxxxxxxxx . . .xxxxxxxxxxxxxxxxx 

ccchhhhhhhhhheeeccccccccchhhhhhhhhhhhhhhhhhhceeeeeehhhhhhhhh 

VFVWGFSFSGALGVPSFVVPSSGPGPRAGARPRRRIQPVPYRLELDQKISSAACGYGFTL 

xxxxxxxxxxxxxxxxxxxxxxx 

eeeeccccccccccceeeeeccccccccccccccccccccchhhhhhhheeeccccceee 

LSSKTADVTKVWGMGLNKDSQLGFHRSRKDKTRGYEYVLEPSPVSLPLDRPQETRVLQVS 

eecccccceeeeccccccccccccccccccccccceeeeeccccccccccccccceeeee 

CGRAHSLVLTDREGVFSMGNNSYGQCGRKVVENEIYSESHRVHRMQDFDGQVVQVACGQD 

cccceeeeeeccceeeeeccccccccccccccccccccccccccccccceeeeeeecccc 

HSLFLTDKGEVYSCGWGADGQTGLGHYNITSSPTKLGGDLAGVNVIQVATYGDCCLAVSA 

eeeeeecccceeeecccccccccccccccccccccccccccceeeeeeecccceeeeeec 

DGGLFGWGNSEYLQLASVTDSTQVNVPRCLHFSGVGKVRQAACGGTGCAVLNGEGHVFVW 

ccceeeeccccccccccccccccccccccccccccceeeeeccccceeeeeecccceeee 

GYGILGKGPNLVESAVPEMIPPTLFGLTEFNPEIQVSRIRCGLSHFAALTNKGELFVWGK 

cccccccccccccccccccccceeeeeeecccceeeeeeecccceeeeeecccceeeecc 

NIRGCLGIGRLEDQYFPWRVTMPGEPVDVACGVDHMVTLAKSFI 

cccccccccccccccccceeecccceeeeecccccccccccccc 



Prosite for DKFZphtes3_21d4 . 1 



PS00001 
PS00001 
PS00002 
PS00002 
PS00002 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PSO0OO6 
PS00006 
PS00006 
PS00006 
PS00006 



200->204 
268->272 
17->21 
82->86 
333->337 
14->18 
34->37 
122->125 
147->150 
190->193 
219->222 
246->249 
410->413 
34->38 
147->151 
190->194 
290->294 
317->321 



asn_glycosylati0n 

asn_glycosylation 

glycosaminoglycan 

glycosaminoglycan 

gl ycosami nogl ycan 

camp_phos pho_s ite 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

pkc_phospho_site 

ck2_phospho_site 

ck2 phospho_site 

ck2~phospho_site 

ck2_ph0sph0_site 

ck2 phospho site 



PDOC00001 
PDOC00001 
PDOC00002 
PDOC00002 
PDOC00002 
PDOC00004 
PDOC00005 
PDOC0O005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PS00007 


209->217 


TYR_PHOSPHO_SITE 


PDOC00007 


PS00007 


208->217 


TYR PHOSPHO SITE 


d rwv nrtnm 


PS00008 


9->l5 


MYRISTYL 


PDOC00008 


PS00008 


20->26 


MYRISTYL 


PDOC00008 


PS00008 


' 133->139 


MYRISTYL 


PDOC00008 


PS00008 


238->244 


MYRISTYL 


PDOC00008 


PS00008 


277->283 


MYRISTYL 


PDOC00008 


PS00008 


302->308 


MYRISTYL 


PDOC00008 


PS00008 


344->350 


MYRISTYL 


PDOC00008 


PS00009 


12->16 


AMI DAT I ON 


PDOC00009 


PS00009 


206->210 


AMI DAT I ON 


PDOC00009 


PS00626 


179->190 


RCC1 2 


PDOC00544 


PS00626 


235->246 


RCC1 2 


PDOC00544 



Pfam for DKFZphtes3_21d4 . 1 



HMM_NAME 

HMM 

Query 



Regulator of chromosome condensation (RCC1) 



* I AaGqHHTVCLTqDGRVYtWG* 
+A GQ+H++ LT++G VY++G 
235 VACGQDHSLFLTDKGEVYSCG 



255 
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DKFZphtes3_21jl5 



group: transcription factors 

DKFZphtes3_21j 15 encodes a novel 898 amino acid protein with similarity human NY-CO-33 
protein . 

NY-CO-33 is a protein recognised by autologous antibodies of human colon cancer patients. The 
novel protein contains 4 C2H2 Zinc fingers and is a new putativ transcription factor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



strong similarity to "NY-CO-33" 

complete cDNA, complete cds, potential start at bp 27, EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 4407 bp 

Poly A stretch at pos . 4321, polyadenylation signal at pos. 4301 



1 CGCTGCAGCA GGTGTCACAG AGCCGCATGC TCCCGGAGCC CAGCCTCTTC 

51 AGCACCGTGC AGCTGTACCG GCAGAGCAGC AAGCTCTATG GCTCCATCTT 

101 CACGGGGGCC AGCAAGTTCC GCTGTAAGGA CTGCAGCGCT GCCTACGACA 

151 CCCTGGTGGA GTTGACAGTG CACATGAACG AGACGGGGCA TTACCGCGAC 

201 GACAACCATG AGACCGATAA CAACAACCCC AAGCGCTGGT CCAAGCCTCG 

251 CAAACGCTCC TTGCTGGAAA TGGAAGGGAA GGAAGACGCC CAGAAGGTGC 

301 TGAAGTGCAT GTACTGTGGC CACTCCTTTG AGTCCCTGCA GGATTTGAGT 

351 GTCCATATGA TCAAAACAAA ACACTACCAA AAAGTGCCTC TGAAGGAACC 

401 CGTCACTCCT GTCGCCGCCA AAATCATCCC TGCCACTCGG AAGAAAGCTT 

451 CCCTGGAGCT GGAGCTCCCC AGCTCCCCAG ATTCCACAGG TGGAACCCCC 

501 AAAGCCACCA TCTCAGACAC CAACGATGCA CTTCAGAAGA ACTCCAACCC 

551 TTACATCACG CCAAATAATC GGTACGGCCA CCAGAATGGG GCCAGCTATG 

601 CATGGCACTT TGAGGCCCGG AAGTCGCAGA TCCTGAAGTG CATGGAGTGT 

651 GGGAGCTCGC ATGACACCCT GCAGGAGCTC ACTGCCCACA TGATGGTCAC 

701 TGGCCACTTC ATCAAGGTCA CCAACTCTGC TATGAAAAAG GGGAAGCCCA 

751 TTGTGGAGAC GCCTGTCACA CCTACCATCA CAACCCTGCT GGATGAGAAG 

801 GTCCAGTCCG TGCCCCTGGC AGCCACCACC TTCACGTCCC CCTCCAATAC 

851 ACCTGCCAGC ATCTCCCCAA AACTGAATGT GGAGGTCAAG AAGGAAGTCG 

901 ACAAGGAGAA AGCGGTCACT GACGAGAAAC CTAAGCAAAA AGACAAGCCT 

951 GGCGAAGAAG AGGAGAAGTG TGACATCTCT TCCAAATACC ATTACTTGAC 

1001 TGAAAATGAC TTAGAAGAGA GTCCCAAGGG GGGGCTTGAT ATCCTCAAAT 

1051 CCTTGGAAAA CACAGTGACA TCCGCAATCA ACAAGGCCCA GAACGGCACT 

1101 CCTAGCTGGG GGGGCTATCC CAGCATCCAT GCCGCCTACC AACTTCCCAA 

1151 CATGATGAAG TTGTCCCTGG GCTCGTCGGG GAAGAGCACG CCCCTGAAAC 

1201 CCATGTTTGG CAACAGTGAG ATTGTCTCCC CGACGAAAAA CCAGACCCTG 

1251 GTCTCTCCAC CCAGCAGCCA GACGTCCCCC ATGCCCAAGA CAAACTTTCA 

1301 TGCCATGGAG GAGCTGGTGA AAAAGGTCAC TGAGAAAGTT GCCAAAGTGG 

1351 AGGAGAAGAT GAAGGAGCCG GATGGGAAGC TTTCCCCGCC CAAGCGGGCC 

1401 ACTCCCTCCC CATGTAGCAG CGAAGTCGGG GAACCCATCA AGATGGAGGC 

14 51 ATCCAGCGAT GGGGGCTTCC GCAGCCAGGA GAACAGCCCC AGCCCCCCGC 

1501 GGGATGGGTG CAAGGATGGG AGCCCCCTCG CTGAGCCGGT GGAGAATGGC 

1551 AAGGAGCTGG TGAAGCCCCT AGCCAGCAGT TTGAGTGGCA GCACGGCCAT 

1601 CATCACCGAC CACCCGCCTG AACAGCCTTT TGTTAACCCT TTGAGCGCCC 

1651 TGCAGTCAGT CATGAACATT CACCTGGGCA AGGCCGCCAA GCCCTCCCTG 

1701 CCTGCCCTGG ACCCCATGAG CATGCTTTTC AAGATGAGCA ACAGCCTGGC 

1751 GGAGAAGGCT GCTGTGGCCA CCCCGCCGCC CCTGCAGTCC AAGAAGGCAG 

1801 ACCACCTCGA CCGCTATTTC TACCACGTCA ACAACGACCA GCCCATAGAC 

1851 TTGACAAAAG GGAAGAGTGA CAAAGGCTGC TCCTTGGGTT CAGTGCTTCT 

1901 GTCACCCACG TCCACAGCCC CGGCAACCTC CTCATCCACG GTGACAACGG 

1951 CAAAGACATC TGCCGTCGTA TCATTCATGT CAAACTCGCC GCTACGCGAG 

2001 AATGCCTTGT CAGATATATC CGATATGCTG AAGAACTTGA CAGAGAGCCA 

2051 CACGTCAAAA TCCTCCACTC CTTCCAGCAT CTCCGAGAAG TCTGACATTG 

2101 ACGGGGCCAC TCTGGAGGAG GCTGAGGAGT CGACGCCCGC CCAGAAGAGG 

2151 AAGGGCCGCC AGTCAAACTG GAACCCCCAG CACCTCCTGA TCCTCCAGGC 

2201 CCAGTTTGCC GCCAGCCTCC GGCAGACCTC AGAAGGGAAG TACATCATGT 

2251 CAGACCTGAG CCCCCAGGAG CGGATGCATA TCTCCAGGTT CACCGGGCTG 

2301 TCCATGACCA CCATCAGCCA CTGGCTGGCC AACGTGAAAT ACCAGCTTCG 

2351 AAGGACAGGT GGAACAAAGT TCCTCAAAAA CTTGGACACT GGCCACCCCG 

2401 TCTTCTTTTG TAACGATTGT GCGTCCCAAA TCAGGACTCC TTCCACGTAC 

2451 ATCAGTCACC TAGAGTCACA CTTAGGCTTC CGGCTACGGG ACTTATCCAA 

2501 ACTGTCCACC GAACAGATTA ACAGTCAGAT AGCACAAACC AAGTCACCGT 

2551 CAGAAAAAAT GGTGACGTCC TCCCCCGAGG AAGACCTGGG GACTTCCTAT 

2601 CAGTGCAAAC TTTGCAATCG GACCTTTGCC AGCAAGCACG CTGTTAAACT 
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2651 TCACCTTAGC AAAACACACG GGAAATCTCC GGAAGACCAC CTTCTGTATG 
2701 TCTCTGAGTT AGAGAAGCAG TAGCATTTGC TTTTGATAGA AAGGACTGCA 
2751 GTTTGCTTTG AGGGAAACTG TGGAAGGCAC CTTCAGGCCC CCTCTGACTT 
2801 GTTGTTCTTG GCACATGTTC TTATTTTAAC TGCAGAGAAT CACTCTGGGC 
2851 TGGACTGTTT TGTATAACTG TACAGTGTTT AATAGAGGTG CATAATCAGC 
2901 TGTTGTTACT GGTAAAATAT GAAGGTTAAA ATGCAGTGGT AAGTGTTTGG 
2951 AACTTTGTGT AAACGGGATT TAGTTGTGAG CATCCTCCCG ATGCTTCAAG 
3001 CTGCATGCAT TAACAGACAG TTTAATTAAG CATTTATAAC GGAATCAGGC 
3051 ACACCTTTTC CACGAGACTC GAGTGTGCTG GCATTTCTCA CCCTTTCATC 
3101 TTTAGCCCTC TGAGTACTTT GAAGCACTTT TGCATTAATT TGGTTAAAAA 
3151 ATAAAATAAA ATAATAATAA TGTATGAAGC TCTGTTTTTT AAACTCCTTA 
3201 CCAGCTTAGT TATAATGAAT AATATGAACC TCCATTTATG CAGGTCTGCA 
3251 GGGGTATAAC ACGCCTTGAA ATTTAAAAGA ATATTATTTT CACATTGAAA 
3301 CATAGATGTA TATATTGTAT AGATTTCAGA CTCTCTTATG AAAAAAAATG 
3351 TGATTGTGGT TAAATGACCT TTTTCTTGCA TTTATAGCAA CAGTGTTTTA 
3401 TGCACCTGCT ATGCTCTGGG CATAAGCTGT GCCTATGTAT AGTGTATATT 
3451 TCTTTTTTTC TTTTTTTTAA GGTCTATGGG TTTTGTTTTT TACATGCAAA 
3501 CATTGTAAAT TATACAGAAG ATACCACAGA TAGCATTTAT AAAGTATACA 
3551 GAAACATTAT CTGAAAGCAA AGTATGATAG TTTGTTTTGC TATACAGTAC 
3601 ATCTATATTG ATAGAGGTTC ATGTTTAAAT TATACATATT TATTAGCATC 
3651 ATATTGTCAT TTGTTTTGAG CAGTCTGAAT AAACGAGACC GGGAAAGACA 
3701 TCCCTGGCAG GCATCAGAAC TATTTTGCAC ATGATTTTTA AAGGTATTTA 
3751 TTAGAAATCA AAGAACACTC AAAATAAACT CAGTGCTCAA AGGGTTAAGT 
3801 CTATTTGAAA AGGTTAAAAA AAAGAACAAA AAAAAAAAAA GAACTTGTAC 
3851 TGTATTTCCT AAACATTGAT AAAGCCTTTA AAATGTTTGT ACTGTAATAC 
3901 TTTGCTTAAA AGTCATGAGG CATTCTGTGA TCCAACCTCT TTCACTTATT 
3951 TATAAGCCCT CTTGGTTGCT ATTCCATATT GTAGGATGCC TTTCTATTTC 
4001 AATTGGTAAC TTTCTGTTTT GTTCTTCCTA ATTATTCTCC CAAGATCCCA 
4051 CACTGCAGCT TTATCTTTAG GCTTATGAAA GGTAACCCGT GGTTACCGGC 
4101 TCTCCAAGTG ATTCTGTTCT TCTCCATTTT TGGCAGTTAA TTTGCAGAAG 
4151 TAACTGACAG CTGACACCAT ATGAGAACCT TTGTATAAAA TATTGGCATG 
4201 TAAACAGCAC AGACACCGTA ACACACTCTG TGCCCTGTTT GGTTGTTGAC 
4251 AATGAAGCAC CATTATGTGA CTCTTCATAT AACCCTTTTT TCTACGGCAG 
4301 CATTAAAATT GTCTTTTTGC TATAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4401 AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 27 bp to 2720 bp; peptide length: 898 
Category: strong similarity to known protein 



1 MLPEPSLFST VQLYRQSSKL 
51 NETGHYRDDN HETDNNNPKR 
101 FESLQDLSVH MIKTKHYQKV 
151 PDSTGGTPKA TISDTNDALQ 
201 QILKCMECGS SHDTLQELTA 
251 ITTLLDEKVQ SVPLAATTFT 
301 KPKQKDKPGE EEEKCDISSK 
351 INKAQNGTPS WGGYPSIHAA 
401 SPTKNQTLVS PPSSQTSPMP 
451 KLSPPKRATP SPCSSEVGEP 
501 LAEPVENGKE LVKPLASSLS 
551 GKAAKPSLPA LDPMSMLFKM 
601 VNNDQPIDLT KGKSDKGCSL 
651 MSNSPLRENA LSDISDMLKN 
701 ESTPAQKRKG RQSNWNPQHL 
751 HISRFTGLSM TTISHWLANV 
801 QIRTPSTYIS HLESHLGFRL 
851 EEDLGTSYQC KLCNRTFASK 



No BLASTP hits available 



YGSIFTGASK FRCKDCSAAY DTLVELTVHM 
WSKPRKRSLL EMEGKEDAQK VLKCMYCGHS 
PLKEPVTPVA AKII PATRKK ASLELELPSS 
KNSNPYITPN NRYGHQNGAS YAWHFEARKS 
HMMVTGHFIK VTNSAMKKGK PIVETPVTPT 
SPSNTPASIS PKLNVEVKKE VDKEKAVTDE 
YHYLTENDLE ESPKGGLDIL KSLENTVTSA 
YQLPNMMKLS LGSSGKSTPL KPMFGNSEIV 
KTNFHAMEEL VKKVTEKVAK VEEKMKEPDG 
IKMEASSDGG FRSQENSPSP PRDGCKDGSP 
GSTAIITDHP PEQPFVNPLS ALQSVMNIHL 
SNSLAEKAAV ATPPPLQSKK ADHLDRYFYH 
GSVLLSPTST APATSSSTVT TAKTSAVVSF 
LTESHTSKSS TPSSISEKSD IDGATLEEAE 
LILQAQFAAS LRQTSEGKYI MSDLSPQERM 
KYQLRRTGGT KFLKNLDTGH PVFFCNDCAS 
RDLSKLSTEQ INSQIAQTKS PSEKMVTSSP 
HAVKLHLSKT HGKSPEDHLL YVSELEKQ 

BLASTP hits 
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Alert BLASTP hits for DKFZphtes3_21 j 15, frame 3 

TREMBL:AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds., N - 1, Score * 
1039, P - 5.5e-105 

PIR:A38437 probable homeotic protein tsh - fruit fly (Drosophila 
melanogaster) , N - 3, Score = 158, P - 7.2e-09 

TREMBL:CE33058_1 gene: "unc-89"; product: "UNC-89"; Caenorhabditis 
elegans UNC-89 (unc-89) gene, complete cds., N = 2, Score - 175, P » 
3.3e-07 



>TREMBL:AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds. 
Length = 687 

HSPs: 



Score - 1039 (155.9 bits), Expect = 5.5e-105, P » 5.5e-105 
Identities = 244/504 (48%), Positives - 319/504 (63%) 



Query: 


170 


QKNSNPYITPNNRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI 


229 




QK +NPY+TPNNRYG+QNGASY W FEARK+QILKCMECGSSHDTLQ+LTAHMMVTGHF+ 




Sbjct: 


14 


QKAANPYVTPNNRYGYQNGASYTWQFEARKAQILKCMECGSSHDTLQQLTAHMMVTGHFL 


73 


Query: 


230 


KVTNSAMKKGKPIVETPVTPTITTLLDEKVQSVPLAATTFTS-PSNT PASISPKLN 


284 




KVT SA KKGK +V PV ++EK+QS+PL TT T P+++ P S + 




Sbjct: 


74 


KVTTSASKKGKQLVLDPV . — VEEKIQSIPLPPTTHTRLPASSIKKQPDSPAGSTT 


126 


Query: 


285 


VEVKKEVDKEKA-VTDEKPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSL 


343 




E KKE +KEK V + K K++ + EK + S+ Y YL E DL++SPKGGLDILKSL 




Sbjct: 


127 


SEEKKEPEKEKPPVAGDAEKIKEESEDSLEKFEPSTLYPYLREEDLDDSPKGGLDILKSL 


186 


Query: 


344 


ENTVTSAINKAQNGTPSWGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMF-GNSEIVSP 


402 




ENTV++AI+KAQNG PSWGGYPS IHAAYQLP +K L ++ +S ++P + G + +S 




Sbjct: 


187 


ENTVSTAISKAQNGAPSWGGYPSIHAAYQLPGTVK-PLPAAVQSVQVQPSYAGGVKSLSS 


245 


Query: 


403 


TKNQTLVSPPSSQTSPMPKTNFHAMEELVKKVTEKV-AKVEEKMKEPDGKLSPPKRATPS 


461 




++ L+ P S T P K+N AMEELV+KVT KV K EE+ E + K S K A S 




Sbjct: 


246 


AEHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKE-KSSLAKAA--S 


302 


Query: 


4 62 


PCSSEVGEPIKMEASSDGGFRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSG 


521 




P + E + K E S ♦ Q+ P K PL NG E +K ++ 




Sbjct: 


303 


PIAKENKDFPKTEEVSG K PQKKG PEAET WEAKKEG PL DVHT PNGT E PLKAKVTNGCN 


359 


Query: 


522 


STAI ITDHPPEQPFVNPLSALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVA 


581 




+ II DH PE F+NPLSALQS+MN HLGK +KP P+LDP++ML+K+SNS+ +K 




Sbjct: 


360 


NLGI IMDHSPEPSFINPLSALQSIMNTHLGKVSKPVSPSLDPLAMLYKISNSMLDKPVYP 


419 


Query: 


582 


TPPPLQSKKADHLDRYFYHVNNDQPI DLTKGKSDK-GCSLGSVLLSPTSTAPATSSSTVT 


640 




P K+AD +DRY+Y N+DQPIDLTK K+ S+ + SP + S + 




Sbjct: 


420 


ATPV KQADAIDRYYYE-NSDQPIDLTKSKNKPLVSSVADSVASPLRESALMDISDMV 


475 


Query: 


641 


TAKTSAVVSFMSN-SPLRENALSDISDMLKNLTE 673 






T+ SS + E + +DS +LE 




Sbjct: 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDE 509 




Score 


= 865 


(129.8 bits), Expect - 7.4e-95, P « 7.4e-95 




Identities = 211/434 (48%), Positives - 268/434 (61%) 




Query: 


447 


EPDGKLSPPKRATPSPCSSEVG — EPIKMEASSDGGFRSQENSPSPPRDG-CKDGSPLAE 


503 




E+LP TPPSV E+++ ++EP + K SP+A+ 




Sbjct: 


247 


EHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKEKSSLAKAASPIAK 


306 


Query : 


504 


P-VE--NGKELVK-PLASSLSGSTAIITD-HPPE--QPFVNPLSALQSVMNIHLG 


551 




P E +GK KPA+ DHP +P ++ ++I+ 




Sbjct: 


307 


ENKDFPKTEEVSGKPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCNNLGIIMD 


366 


Query: 


552 


KAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYHVNN DQPID 


608 




+ +PS ++P+S L+N+ K+ PL DL Y ++N D+P+ 




Sbjct: 


367 


HSPEPSF— INPLSALQSIMNTHLGKVSKPVSPSL DPL-AMLYKISNSMLDKPV- 


417 


Query: 


609 


LTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENALSDISDML 


668 




K S P + + S+V ++ SPLRE+AL DISDM+ 




Sbjct: 


418 


-YPATPVKQADAIDRYYYENSDQPIDLTKSKNKPLVSSVADSVA-SPLRESALMDISDMV 


475 


Query: 


669 


KNLTESHTSKSSTPSSISEKSDIDGATLEEA-EESTPAQKRKGRQSNWNPQHLLILQAQF 


727 




KNLT T KSSTPS++SEKSD DG++ EEA +E +P KRKGRQSNWNPQHLLILQAQF 
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Sbjct: 476 KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDELSPVHKRKGRQSNWNPQHLLILQAQF 535 

Query; 728 AASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 787 

A+SLR+T+EGKYIMSDL PQER+HIS+FTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 
SbjCt: 536 ASSLRETTEGKYIMSDLGPQERVHISKFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 595 

Query: 788 TGHPVFFCNDCASQI RTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKSPSEKMV- 846 

TGHPVFFCNDCASQ RT STYISHLE+HLGF L+DLSKL QI Q +K + K + 
Sbjct: 596 TGHPVFFCNDCASQFRTASTYISHLETHLGFSLKDLSKLPLNQIQEQQNVSKVLTNKTLG 655 

Query: 847 -TSSPEEDLGTSYQCKLCNRTFASK 870 

+ EEDLG+++QCKLCNRTFA + 
Sbjct: 656 PLGATEEDLGSTFQCKLCNRTFAKQ 680 

Score - 98 (14.7 bits), Expect - 7,4e-95, P = 7.4e-95 
Identities = 32/95 (33%), Positives = 47/95 (49%) 



90 KVLKCMYCGHSFESLQDLSVHMIKTKHYQKVPL KEPVT- PVAAKI I PATRKKAS 142 

++LKCM CG S ++LQ L+ HM+ T H+ KV K+ V PV + I + + 

45 QILKCMECGSSHDTLQQLTAHMMVTGHFLKVTTSASKKGKQLVLOPVVEEKIQSIPLPPT 104 

143 LELELPSS P D S TGGT P KAT I S DTN DALQKN S N P 175 

LP+S PDS G+ T S+ +K P 

105 THTRLPASSIKKQPDSPAGS TTSEEKKEPEKEKPP 139 



Query: 
Sbjct: 
Query: 
Sbjct: 

Score = 81 (12.2 bits), Expect » 4.6e-93, P = 4.6e-93 
Identities « 13/29 (44%), Positives = 20/29 (68%) 



Query: 28 ASKFRCKDCSAAYDTLVELTVHMNETGHY 56 

A +C +C +++DTL +LT HM TGH+ 
Sbjct: 44 AQI LKCMECGSSHDTLQQLTAHMMVTGH F 72 



Pedant information for DKFZphtes3_21 jl5, frame 3 



Report for DKFZphtes3_21 j 15 . 3 



[LENGTH] 
[MW] 
IpU 
[HOMOLJ 



898 

98486.72 
8.61 

TREMBL : AF039698 



antigen NY-CO-33 (NY-CO-33) mRNA, complete cds . 0.0 



1 gene: "NY-CO-33"; product: H antigen NY-CO-33"; Homo sapi 



[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] - 

[KW1 

tKW] 



BL00028 Zinc finger 
zinc finger le-06 
DNA binding le-06 
transcription regulation le-06 
MYRISTYL 9 
ZINC_FINGER_C2H2 4 
CAMP_PHOSPHO_SITE 5 
CK2_PHOSPHO_SITE 19 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 15 
AS N_G L YC OS Y L AT I ON 4 
Zinc finger, C2H2 type 
Alpha_Beta 

LOW COMPLEXITY 11.36 % 



C2H2 type, domain proteins 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MLPEPSLFSTVQLYRQSSKLYGSIFTGASKFRCKDCSAAYDTLVELTVHMNETGHYRDDN 
ccccceeeeeeeeccccceeeeeeeccccceeecccchhhhhhhhhhhcccccccccccc 

HETDNNNPKRWSKPRKRSLLEMEGKEDAQKVLKCMYCGHSFESLQDLSVHMIKTKHYQKV 

cccccccccccccccchhhhhhhccchhhhhhhhhcccccchhhhheeeeeeeecceeee 

PLKEPVTPVAAKI I PATRKKAS LELELPSS PDSTGGTPKATISDTNDALQKNSNPY IT PN 

xxxxxxxxxx 

eccccccceeeeeeehhhhhhhhhhcccccccccccccceeeeccchhhhhccccccccc 

NRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI KVTNSAMKKGK 

ccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhceeeeeccccccccc 

PIVETPVTPTITTLLDEKVQSVPLAATTFTSPSNTPASISPKLNVEVKKEVDKEKAVTDE 

xxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

ccccccccccchhhhhhhhccccccccccccccccccccccccccccccccchhhhhhcc 



SEQ 



KPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSLENTVTSAINKAQNGTPS 
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SEG x 

PRD ccccccccccccccchhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhcccccc 

SEQ WGGYPSIHMYQLPNMMKLSLGSSGKSTPLKPMFGNSEIVSPTKNQTLVSPPSSQTSPMP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ KTNFHAMEELVKKVTEKVAKVEEKMKEPDGKLSPPKRATPSPCSSEVGEPIKMEASSDGG 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeecccc 

SEQ FRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSGSTAIITDHPPEQPFVNPLS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccc 

SEQ ALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYH 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccceeee 

SEQ VNNDQPIDLTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENA 

SEG xxxxxxxxxxxxxxxxxxxxxxxx 

PRD ecccccceeecccccccccccceeecccccccccccceeeeceeeeeeeeccccccchhh 

SEQ LSDISDMLKNLTESHTSKSSTPSSISEKSDIDGATLEEAEESTPAQKRKGRQSNWNPQHL 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccccccccccceeecccccchhhhhhhhccchhhhhhcccccccchh 

SEQ LILQAQFAASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGT 

SEG 

PRD hhhhhhhhhhhhhccccceeecccccchhhhhhhhccccchhhhhhhhhhhhhhhhcccc 

SEQ KFLKNLDTGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKS 

SEG 

PRD ceeecccccccceeecccceeeecccchhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcc 

SEQ PSEKMVTSSPEEDLGTSYQCKLCNRTFASKHAVKLHLSKTHGKSPEDHLLYVSELEKQ 

SEG 

PRD ccceeeeccccccccceeehhhhhhhhhhhhhhhhhccccccccccceeeeeeecccc 



Prosite for DKrzphtes3_21jl5.3 



PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



51->55 
405->409 
670->674 
864->868 
69->73 
75->79 
139->143 
432->436 
456->460 
17->20 
137->140 
157->160 
280->283 
318->321 
332->335 
384->387 
435->438 
588->591 
614->617 
641->644 
676->679 
686->689 
730->733 
842->845 
42->46 
78->82 
103->107 
149->153 
161->165 
210->214 
214->218 
253->257 
325->329 
573->577 
684->688 
689->693 
695->699 
745->749 



ASN_GLYCOSYLATION 

AS N_G L YC OS Y LAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2__PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC00001 - 

PDOC00001 

PDOC00001 

PDOC00001 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00004 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC000O6 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 

PDOC00006 
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PS00006 


810- 


■>814 


CK2_PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


840- 


>844 


CK2_PHOSPHO_ 


_SITE 


PDOC00006 


PS00006 


848- 


•>852 


CK2~PHOSPHO~ 


_SITE 


PDOC00006 


PS00006 


884- 


■>888 


CK2~PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


893- 


■>897 


CK2_PHOSPHO_ 


_SITE 


PDOC00006 


PS00007 


732- 


■>740 


TYR_PHOSPHO~ 


"site 


PDOC00007 


PS00007 


883- 


■>892 


TYR_PHOSPHO^ 


"site 


PDOC00007 


PS00008 


22 


l->28 


MYRISTYL 




PDOC0O008 


PS00008 


156- 


>162 


MYRISTYL 




PDOCOOUOo 


PS00008 


188- 


•>194 


MYRISTYL 




PDOC0O008 


PS00008 


362- 


■>368 


MYRISTYL 




PDOC00008 


PS00008 


479- 


■>485 


MYRISTYL 




PDOC00008 


PS00008 


494- 


>500 


MYRISTYL 




PDOC00008 


PS00008 


498- 


■>504 


MYRISTYL 




PDOC00008 


PS00008 


617- 


>623 


MYRISTYL 




PDOC00008 


PS00008 


757- 


>763 


MYRISTYL 




PDOC00008 


PS00028 


795- 


>816 


ZING FINGER 


C2H2 


PDOC00028 


PS00028 


860- 


>882 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


33 


!->56 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


94- 


>117 


ZINC FINGER" 


"C2H2 


PDOC00028 



Pfam for DKFZphtes3_21 j 15 . 3 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C++ C ++ + +L+ HM+ H 
Query 33 CKD— CSAAYDTLVELTVHMNET-GH 
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26.69 (bits) f: 94 t: 116 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + CG +F + +L HM+ H 

dkfzphtes3 94 CMY— CGHSFESLQDLSVHMIKT-KH 116 

Query f: 795 t: 815 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C R++S+++ H+ +H 
Query 795 CND--CASQIRTPSTYISHLESH 815 

27.12 (bits) f: 860 t: 881 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus : 
Query *CpwPDCgKtFrrwsNLrRHMR.T . H* 

C+ C++TF +++ + H+ H 

dkfzphtes3 860 CKL— CNRTFASKHAVKLHLSK-TH 881 
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group: intracellular transport and trafficking 

DKFZphtes3_21116 encodes a novel 66 amino acid protein nearly identical to rat ribosome 
attached membrane protein 4 (ramp4) . 

The novel protein seems to be the human orthologe of rat ramp 4 . Ramp4 is involved in the 
regulation of translocation of proteins into endoplasmic reticulum, e.g. of the MHC class 
associated invariant (gamma) chain. 

The new protein can find application in modulation of protein translocation into the 
endoplasmic reticulum. 

identical to rat ribosome attached membrane protein 4 

ORF Bp 316-513 (66 aa) see BLASTX 

Sequenced by LMU 

Locus : unknown 

Insert length: 2488 bp 

Poly A stretch at pos. 2464, polyadenylation signal at pos. 2442 

1 CTTCCTCTTT CACTCCGCGC TCACGGCGGC GGCCAAAGCG GCGGCGACGG 
51 CGGCGCGAGA ACGACCCGGC GGCCAGTTCT CTTCCTCCTG CGCACCTGCC 

101 CCGCTCGGTC AGTCAGTCGG CGGCCGGCGC CCGGCTTGTG CTCAGACCTC 

151 GCGCTTGCGG CGCCCAGGCC CAGCGGCCGT AGCTAGCGTC TGGCCTGAGA 

201 ACCTCGGCGC TCCGGCGGCG CGGGCACCAC GAGCCGAGCC TCGCAGCGGC 

251 TCCAGAGGAG GCAGGCGAGT GAGCGAGTCC GAGGGGTGGC CGGGGCAGGT 

301 GGTGGCGCCG CGAAGATGGT CGCCAAGCAA AGGATCCGTA TGGCCAACGA 

351 GAAGCACAGC AAGAACATCA CCCAGCGCGG CAACGTCGCC AAGACCTCGA 

401 GAAATGCCCC CGAAGAGAAG GCGTCTGTAG GACCCTGGTT ATTGGCTCTC 

451 TTCATTTTTG TTGTCTGTGG TTCTGCAATT TTCCAGATTA TTCAAAGTAT 

501 CAGGATGGGC ATGTGAAGTG ACTGACCTTA AGATGTTTCC ATTCTCCTGT 

551 GAATTTTAAC TTGAACTCAT TCCTGATGTT TGATACCCTG GTTGAAAACA 

601 ATTCAGTAAA GCATCCTGCC TCAGAATGAC TTTCCTATCA TGCTTCATGT 

651 GTCATTCCAA GGTTTCTTCA TGAGTCATTC CAAGTTTTCT AGTCCATACC 

701 ACAGTGCCTT GCAAAAAACA CCACATGAAT AAAGCAATAA AATTTGATTG 

751 TTAAGATACA GTAGTGGACC CTACTTATTC AGTCAATTAA GAGTAAGTTT 

801 TTTTATGTGG TTATTAAAAC AGTATGAACA ATTAGTCTAA CTCTGCATAG 

851 ACAGGGTCTA GATTTTGTTA ACCCAAATGT ATAACTGCAG TTAGCTTAAA 

901 TTACAATTTG AAGTCTTGTG GTTTTTATAT AGCTAGGCAC TTTATTACTC 

951 TTTTGAACTG AAAGCACACT CCCTTATAGG TTCATGTAAC TGTCCTGTAA 
1001 TAAGGTGCTT ATAAATGGAA CAACTACACA GCCTAGTTTT GCCACAACCT 
1051 TTAGCATCTA AAAAGTTTTA AAAGCTTCTA AATGTCTAAT ATAAAGGGAG 
1101 ATGCTTATAG CCACAACATC TATTTTACCA ATATTGTTTC CATTACACTA 
1151 CCTTGGATTT TGCATGAGTG AGTATAGTAA CCCAAGATGC CATAAAAAAA 
1201 AACTTGATCG TTTTCTGACT TAATTAGTTA CTGTGGTTTC ACTAAAAGCT 
1251 ACCGTGGTGG AGTGAAGTCA GTCAGGGAAG GTTTGTTTAT GTTACATTTA 
1301 TTTCACCAGA ACTATTTTAA TATATCAAAG GGGTTTACTA TGCCAAACAA 
1351 AATTCTAGGG AAAAATACTG CTAAAAATGG ATGCCTCATC AGAACATGCT 
1401 GTTGAGTCCA ATGTGCCATA AGACATTTTA GCATGTTAAA TAGCACTTTT 
1451 AATAGCAAAA AAAGGCACAT CAACTGCGAA GTTATCCTTA GTTTGCAAAT 
1501 GCTTTTTCTA GATTAATGAT TTTTCAATCA TTAGGGTACT AGACACATCA 
1551 GCCTAAAGTG GCATCTGGAA TTGAATGGAT TTACTGATAA TGATCAGTCT 
1601 TTAGTCTTCC CTTTGTTATA TGACTTTATA GGTTATGATT GATCAAATTT 
1651 ACGTTTTACT AATGGTAAGG GTGAGGGTCA TAGGGCAGGT TTTGGGTTTT 
1701 CTAGTACTGT TGAAAACTGC AAGTATTGGC TATTTGTATA CTTAGCCATA 
1751 ACTTGGTGAA AAAAAACCTG AGCAGTGTCT ATGTATTAAT GCGTTGGAAA 
1801 GAAAGCTGCT TGTGTTTGCT TTGTTAATTG CCTCAGGATA TTTCTTTTAA 
1851 AATAAGCTGT TTTAAGAGGA ACAGAAGGGA AATCTGCTAC CTAGTCTATA 
1901 CACAGCGTGA ACCTCACAGG GGGCTTCTGA TACCCTCAAA CATGGAGAAC 
1951 AGTAAGGGAG CAGAGTGGTT AAGGACTTTC AGGAACTTAA CTATTCTGGA 
2001 ATAAGGAATG AATCAACTGA CCTTGGGCCA GCAGGTTTTT AACTAAATTG 
2051 TTACTTGCCT TTCTCACCCA GTTAATCAGT CTCTGTACTT GTTTCCCTTT 
2101 TTGAAACAAG TGTCTTGGTT AACTAATTCT GTTTTATGGT TGTGCTAAAT 
2151 TCATAGCAGG TGCCTTATTC TTTGCTTTTA GTCAAACCAT TCCATATCAG 
2201 AATTTTCCTT GGTTTACTAT AGATATTTGG CTTTAAGTTG TTGTTTGTGT 
2251 TTTTTAATGT ACAATGTTCT GATAAATTTG ACTGTTAAAT TGCTATAGCT 
2301 AGCAATCATT TTACATATGT AAAAAATTGC ATTCCCTTTG TATTTCATGT 
2351 GTAATTCACC AATTAAGTGC AGTTTATATT CAGGTTGGAT TATGCATGTT 
2401 TAGGTAAACG AAAGCTGTGT CTTACTTGAT TTATTCTTTA AAAATAAAGT 
2451 TCCCTGAATA TTTGAAAAAA AAAAAAAAAA AAAAAAAA 
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BLAST Results 



Entry HSCDN13 from database EMBL: 

H. sapiens (TL5) mRNA from LNCaP cell line 

Score = 1075, P = 5.8e-41, identities = 219/221 

Entry AF100470_1 from database TREMBLNEW: 

gene: "RAMP 4 " ; product: "ribosome attached membrane protein 4"; Rattus 
norvegicus ribosome attached membrane protein 4 (RAMP4 ) mRNA, complete 
cds . 

Score - 331, P = 3.9e-28, identities » 66/66, positives - 66/66, frame 
+ 1 

Entry HSG19910 from database EMBL: 
human STS A002B4 8. 
Score = 530, P = 2.1e-17, identities = 108/109 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 316 bp to 513 bp; peptide length: 66 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 



1 MVAKQRIRMA NEKHSKNITQ RGNVAKTSRN APEEKASVGP WLLALFIFVV 
51 CGSAIFQIIQ SIRMGM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21116, frame 1 

TREMBLNEW :RN02 382 3 6_1 gene: "ramp4"; product: "ribosome associated 
membrane protein RAMP4 " ; Rattus norvegicus mRNA for ribosome 
associated membrane protein RAMP4 , N - 1, Score = 331, P - 6.2e-30 

TREMBL:AF100470_1 gene: "RAMP4"; product: "ribosome attached membrane 
protein 4"; Rattus norvegicus ribosome attached membrane protein 4 
(RAMP4) mRNA, complete cds., N = 1, Score - 331, P = 6.2e-30 



>TREMBLNEW : RN02 3 8 2 3 6_1 gene: ,t ramp4"; product: "ribosome associated membrane 
protein ramp 4 " ; Rattus norvegicus mRNA for ribosome associated membrane 
protein RAMP4 

Length = 75 

HSPs: 

Score = 331 (49.7 bits), Expect - 6.2e-30, P - 6.2e-30 
Identities = 66/66 (100%), Positives - 66/66 (100%) 

Query: 1 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFI FVVCGSAI FQI IQ 60 

MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFI FVVCGSAI FQI IQ 
Sbjct: 10 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFI FVVCGSAI FQI IQ 69 

Query: 61 SIRMGM 66 

SIRMGM 
Sbjct: 70 SIRMGM 75 

No Pedant data available 



716 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_21n23 
group: testes derived 

DKFZphtes3 15 j 18 encodes a novel 148 amino acid protein with strong similarity to rat 7acomp 
protein . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific . 
genes . 

strong similarity to rat 7acomp protein 
on genomic level encoded by AF107885 
Sequenced by LMU 
Locus: /map-"14q24.3" 
Insert length: 3122 bp 

Poly A stretch at pos. 3070, polyadenylation signal at pos. 3045 

1 GGAAAACCTC GTGGGCTCAG CCCGGGACAA AGGGCCAGGG AAGTTGGGTG 

51 GTTCTGTGCT TGGTCTGTCA ATGGAGGAGA TCAAAGTTTT ACGAAGGGTG 

101 AAGGAGGAGA ATGATCGGCG AGGTGGATTT ATTCGCATAT TTCCTACATC 

151 TGAGACATGG GAAATATATG GGTCCTACCT CGAGCATAAG ACCTCAATGA 

201 ACTATATGCT GGCAACACGC CTCTTCCAGG ACAGGGGAAA CCCAAGAAGA 

251 AGCTTATTGA CAGGAAGAAC ACGAATGACT GCTGATGGAG CGCCAGAATT 

301 GAAGATAGAG AGTCTGAATT CAAAGGCCAA GCTGCATGCT GCACTTTACG 

351 AGAGGAAGCT CCTGTCTCTG GAGGTGCGAA AACGTAGACG ACGGAGTAGC 

401 AGATTGAGGG CAATGAGGCC AAAATACCCA GTGATTACCC AACCAGCTGA 

451 AATGAATGTT AAAACTGAGA CAGAGAGTGA AGAGGAGGAA GAAGTCGCAT 

501 TAGATAATGA AGATGAAGAA CAGGAGGCTT CCCAGGAGGA GTCTGCAGGA 

551 TTTCTTAGAG AAAATCAAGC CAAATATACA CCCTCATTGA CAGCTTTGGT 

601 AGAAAATACA CCCAAAGAAA ATTCCATGAA AGTTCGTGAA TGGAATAATA 

651 AAGGTGGACA CTGCTGCAAA CTTGAGACTC AGGAGCTAGA GCCTAAATTT 

701 AACCTGATGC AGATTCTTCA AGATAATGGC AATCTTAGCA AAATGCAGGC 

751 CCGAATAGCA TTCTCTGCCT ATCTCCAGCA TGTTCAAATT CGCCTGATGA 

801 AAGACAGTGG CGGTCAGACG TTCAGTGCCA GTTGGGCTGC CAAAGAGGAT 

851 GAACAGATGG AGCTGGTTGT TCGTTTCCTC AAGCGAGCAT CAAATAACCT 

901 CCAGCATTCA CTGAGGATGG TATTACCCAG TCGACGATTG GCACTTCTGG 

951 AACGCAGAAG AATCCTGGCC CACCAGCTGG GTGACTTTAT CATTGTATAC 

1001 AACAAGGAAA CAGAACAAAT GGCTGAAAAG AAATCAAAGA AGAAAGTTGA 

1051 GGAAGAAGAG GAAGATGGGG TGAATATGGA AAACTTTCAG GAGTTCATCA 

1101 GACAAGCAAG TGAGGCTGAA CTGGAGGAGG TGTTGACTTT TTATACCCAA 

1151 AAGAACAAGT CTGCTAGTGT CTTCCTGGGG ACTCACTCTA AAATTTCTAA 

1201 GAACAACAAC AATTATTCTG ATAGTGGGGC AAAAGGTGAT CACCCTGAGA 

1251 CTATAATGGA AGAAGTGAAA ATAAAGCCAC CTAAACAGCA ACAGACGACA 

1301 GAAATTCATT CTGATAAATT ATCTCGATTT ACCACTTCAG CAGAAAAAGA 

1351 GGCAAAATTA GTTTATAGCA ATTCCTCCTC TGGTCCTACT GCTACTCTGC 

1401 AGAAAATTCC CAACACCCAT TTGTCATCTG TTACAACCTC TGACCTCTCT 

1451 CCAGGGCCTT GCCACCATTC TTCTTTATCT CAAATTCCTT CAGCTATCCC 

1501 CAGCATGCCT CACCAGCCAA CAATTTTACT GAACACAGTC TCTGCCAGTG 

1551 CTTCTCCCTG CCTACATCCC GGGGCACAGA ACATCCCAAG CCCTACTGGC 

1601 CTGCCACGCT GTCGATCAGG AAGTCACACC ATTGGTCCCT TTTCTTCCTT 

1651 CCAAAGTGCT GCACACATCT ATAGCCAGAA ACTGTCTCGT CCCTCTTCAG 

1701 CAAAGGCAGG ATCGTGCTAT CTAAACAAGC ATCATTCAGG AATAGCCAAA 

1751 ACACAAAAAG AGGGAGAAGA TGCTTCTTTA TATAGCAAAC GGTACAACCA 

1801 AAGTATGGTT ACAGCTGAAC TTCAGCGGCT AGCTGAGAAG CAGGCAGCGA 

1851 GACAGTATTC TCCATCCAGC CACATCAACC TCCTCACCCA ACAGGTAACA 

1901 AACCTGAATT TGGCAACTGG CATCATAAAC AGAAGCAGTG CTTCAGCTCC 

1951 CCCAACCCTC CGACCCATCA TCAGTCCTAG TGGCCCGACA TGGTCTACAC 

2001 AGTCAGACCC CCAAGCTCCC GAGAATCACT CCAGCTCTCC TGGAAGCAGG 

2051 AGCCTGCAGA CAGGGGGATT TGCCTGGGAA GGAGAAGTAG AAAACAACGT 

2101 GTACAGCCAG GCTACAGGGG TGGTCCCCCA GCACAAGTAT CACCCCACAG 

2151 CAGGCAGCTA TCAGCTTCAA TTTGCCCTGC AGCAACTTGA ACAACAAAAA 

2201 CTTCAGTCCC GGCAGCTCCT GGACCAGAGT CGAGCCCGGC ACCAGGCAAT 

2251 CTTTGGCAGC C AG AC ACT AC CTAACTCCAA TTTATGGACA ATGAATAATG 

2301 GTGCAGGTTG TAGAATTTCC AGTGCCACAG CTAGTGGCCA GAAGCCAACC 

2 351 ACTCTGCCAC AAAAAGTGGT ACCACCTCCA AGTTCTTGCG CCTCCCTGGT 

2401 TCCCAAACCC CCACCCAACC ACGAACAAGT GCTCAGAAGG GCAACATCCC 

2451 AGAAAGCTTC CAATACCCGC TTCAGATCCT CCTTTCAAAA CTATTTGTGG 

2501 TATTTCTTCC AAGCAGTCAG CTGAACTGAG GACGACAGCC TACAAACAAC 

2551 TACATGCATC TGAACTGTCT CTTGTAAATG AGCTTTTTTC AGAGCCAGAA 

2601 TCATACTCTC CAGGAAATAT GGAGAAAGAA ACCTGAGGAG ATTGAAGTTT 

2651 GCCAGGCACA AGGGCAAAAC TCAGACTGAA TGAATTTGAA AGGGTGGGGC 

2701 CAAAGATGTT GTAACCTGGG AGACTTCTCT GAAGAAAGAA AACTGTTTAA 
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2751 GAAACACAGA CTGAACTGCA GTACTTTTCC TTAAATAGCT GAGATGACCT 
2801 TCTTTACCCT GGGCTTAGGT GATTCTCATC AGGGTGACCT GAGTGGAAGT 
2851 TGGTGGTAAC GACTGTTCTG TGTCAGCACC CAGGACAGTG GTGTCTGTTA 
2901 AGGCTGCCAG GGATTAGCAG GGAGGAAAGC CATCAGGACT GGGTAGCCTG 
2951 GTAGCACCAA ATCCCAATTA ATGTTACCTG AACATGTGGT GAGGTCAGCC 
3001 GTATGATGAA AGATGTTTAA GAGATTAATG TCAGAAGAAT ATGAAAATAA 
3051 ACACCGGCTT AAAAAATGTT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AF107885 from database EMBL: 

Homo sapiens chromosome 14q24.3 clone BAC270M14 transforming growth 
factor-beta 3 (TGF-beta 3) gene, complete cds; and unknown genes. 
Score = 3042, P = 3.0e-219, identities - 610/612 
5 exons matching 1893-3070 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 71 bp to 2521 bp; peptide length: 817 
Category: strong similarity to known protein 



1 MEEIKVLRRV KEENDRRGGF IRIFPTSETW EIYGSYLEHK TSMNYMLATR 
51 LFQDRGNPRR SLLTGRTRMT ADGAPELKIE SLNSKAKLHA ALYERKLLSL 
101 EVRKRRRRSS RLRAMRPKYP VITQPAEMNV KTETESEEEE EVALDNEDEE 
151 QEASQEESAG FLRENQAKYT PSLTALVENT PKENSMKVRE WNNKGGHCCK 
201 LETQELEPKF NLMQILQDNG NLSKMQARIA FSAYLQHVQI RLMKDSGGQT 
251 FSASWAAKED EQMELVVRFL KRASNNLQHS LRMVLPSRRL ALLERRRI LA 
301 HQLGDFIIVY NKETEQMAEK KSKKKVEEEE EDGVNMENFQ EFIRQASEAE 
351 LEEVLTFYTQ KNKSASVFLG THSKISKNNN NYSDSGAKGD HPETIMEEVK 
401 IKPPKQQQTT EIHSDKLSRF TTSAEKEAKL VYSNSSSGPT ATLQKIPNTH 
451 LSSVTTSDLS PGPCHHSSLS QIPSAIPSMP HQPTILLNTV SASASPCLHP 
501 GAQNIPSPTG LPRCRSGSHT IGPFSSFQSA AHIYSQKLSR PSSAKAGSCY 
551 LNKHHSGIAK TQKEGEDASL YSKRYNQSMV TAELQRLAEK QAARQYSPSS 
601 HINLLTQQVT NLNLATGIIN RSSASAPPTL RPIISPSGPT WSTQSDPQAP 
651 ENHSSSPGSR SLQTGGFAWE GEVENNVYSQ ATGVVPQHKY HPTAGSYQLQ 
701 FALQQLEQQK LQSRQLLDQS RARHQAIFGS QTLPNSNLWT MNNGAGCRIS 
751 SATASGQKPT TLPQKWPPP SSCASLVPKP PPNHEQVLRR ATSQKASNTR 
801 FRSSFQNYLW YFFQAVS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_21n23, frame 2 

TREMBL:AF064856_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds., N = 1, Score - 1845, P = 2.2e-190 

TREMBL:AF107885_3 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N = 1, Score = 443, P - 5.3e-41 

TREMBL: AF107885_4 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N * 1, Score » 265, P * 8.2e-22 



>TREMBL:AF064856_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds. 

Length - 436 

HSPs: 

Score = 1845 (276.8 bits), Expect = 2.2e-190, P = 2.2e-190 
Identities - 369/435 (84%), Positives = 395/435 (90%) 
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Query: 


115 


Sbjct: 


1 


Query: 


175 


Sbjct: 


61 


Query: 


235 


Sbjct: 


121 


Query : 


295 


Sbjct: 


181 


Query: 


355 


Sbjct : 


241 


Query: 


415 


Sbjct: 


300 


Query: 


471 


Sbjct: 


360 


Query: 


531 


Sbjct: 


419 



MRPKYPVIT PAEMN+KTETESEEEEEV LDNEDEEQEASQEESAG L ENQAKYTPSLT 



+VEN+P+EN+MKV EW NKG CCK+ETQE E KFNLMQILQDNGNLSK+QAR+AFSAY 



LQHVQ+RL KDSGGQT S SWAAKEDEQMELVVRFLKRAS+NLQHSLRMVLPSRRLALLE 



RRRILAHQLGDFI+VYNKETEQMAEKKSKKK+EEEEEDGVN E+FQEFIRQASEAELEEV 



LTFYTQKNKSASVFLGTHSK SKN+++YSDSGAKGDHPETI +EVKIK PKQQQ TEIHS 



DKLSRFTTSA KEAKLVY+N SS GP A L Q++P+THLSS+ TTS LS GP HHSSLS 



QI AIPSMPHQ +LLN V SASP +HPG N+ SP GLPRCRSGS+TIGPFSSFQSA 



AHIYSQKLSRPSSAKAG 



Pedant information for DKFZphtes3_21n23, frame 2 
Report for DKFZphtes3_21n23 . 2 



[LENGTH] 
[MW] 
tpU 
[HOMOL] 

complete cds. 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

(KW) 



817 

91522.09 
9.32 

TREMBL : AF06485 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein mRNA, 
le-166 
MYRISTYL 6 
CAMP_PHOSPHO_SITE 4 
CK2_PHOSPHO_SITE 12 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 15 
ASN_GLYCOS YLAT I ON 7 
Alpha_Beta 

LOW COMPLEXITY 13.83 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MEEIKVLRRVKEENDRRGGFIRIFPTSETWEIYGSYLEHKTSMNYMLATRLFQDRGNPRR 

ccchhhhhhhhhhhccccceeeecccccceeeecceeeecccchhhhhhhhhhhcccccc 

SLLTGRTRMTADGAPELKIESLNSKAKLHAALYERKLLSLEVRKRRRRSSRLRAMRPKYP 

xxxxxxxxxxxxxxxxxxxx 

ccccccceeeccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

VITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLTALVENT 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ceeeccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhccccceeeeeccc 

PKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARIAFSAYLQHVQI 

cccccceeeeeccccccccchhhhhhhccchhhhhhhcccchhhhhhhhhhhhhhhhhhh 

RLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLERRRILA 

xxxxxxxxxxxxxxx . 

hhhhcccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhh 

HQLGDFIIVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEVLTFYTQ 

xxxxxxxxxxxxx 

hhccceeeeeehhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhh 

KNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHSDKLSRF 

ccccceeeecccccccccccccccccccccccchhhhhhhccccccceeeeecccccccc 
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SEQ TTSAEKEAKLVYSNSSSGPTATLQKIPNTHLSSVTTSDLSPGPCHHSSLSQI PSAI PSMP 

SEG 

PRD hhhhhhhheeeecccccccceeeecccccccccccccccccccccccccccccccccccc 

SEQ HQPTILLNTVSASASPCLHPGAQNI PSPTGLPRCRSGSHTIGPFSSFQSAAHIYSQKLSR 

SEG 

PRD cccceeeeccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccc 

SEQ PSSAKAGSCYLNKHHSGIAKTQKEGEDASLYSKRYNQSMVTAELQRLAEKQAARQYSPSS 

SEG 

PRD cccccccceeeecccccccccccccccceeeecchhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ HINLLTQQVTNLNLATGIINRSSASAPPTLRPIISPSGPTWSTQSDPQAPENHSSSPGSR 

SEG . . xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccceeeecccccccccccccccccccccccccc 

SEQ SLQTGGFAWEGEVENNVYSQATGWPQHKYHPTAGSYQLQFALQQLEQQKLQSRQLLDQS 

SEG xxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RARHQAIFGSQTLPNSNLWTMNNGAGCRISSATASGQKPTTLPQKVVPPPSSCASLVPKP 

SEG 

PRD hhhhhhhhccccccccceeeeccccceeeeeeeccccccccccceeecccccceeecccc 

SEQ PPNHEQVLRRATSQKASNTRFRSSFQNYLWYFFQAVS 

SEG 

PRD cccchhhhhhhhhhhcccccccccccceeeeeeeccc 



Prosite for DKF2phtes3_21n23 . 2 



PS00001 


221- 


->225 


ASN G LYCOS Y LAT I ON 


PDOC00001 


PS00001 


362- 


->366 


ASN GL YCOS Y LAT I ON 


PDOC00001 


PS00001 


381- 


->385 


ASN GL YC OS Y LAT I ON 


PDOC00001 


PS00001 


434- 


->438 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


576- 


->580 


ASN GLYCOSYLATION 


PDOC00001 


PS000O1 


620- 


->624 


ASN GLYCOSYLATION 


PDOC00001 


PS000O1 


652- 


->656 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


106- 


->110 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


107 


->111 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


271- 


->275 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


789 


->793 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


64->67 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


109- 


->112 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


180- 


->183 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


185- 


->188 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


280- 


->283 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


287- 


->290 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


322- 


->325 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


359- 


->362 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


414- 


->417 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


535 


->538 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


543- 


->546 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


561- 


->564 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


572- 


->575 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


629- 


->632 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


793- 


->796 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


35->39 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


132- 


->136 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


134- 


->138 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


136- 


->140 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


154- 


->158 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180- 


->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


347- 


->351 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


->398 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


422- 


->426 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


455- 


->459 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


561- 


->565 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


643- 


->647 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


563- 


->572 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


195- 


->201 


MYRISTYL 


PDOC00008 


PS00008 


248- 


->254 


MYRISTYL 


PDOC00008 


PS00008 


510- 


->516 


MYRISTYL 


PDOC00008 


PS0O0O8 


557- 


->563 


MYRISTYL 


PDOC00008 


PS00008 


746- 


->752 


MYRISTYL 


PDOC00008 


PS00008 


756- 


->762 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_21n23.2) 
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DKFZphtes3_22c23 



group: testes derived 

DKFZphtes3 22c23 encodes a novel 223 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

complete cDNA, complete cds, 3 EST hits (two from a testis library) 
Sequenced by LMU 
Locus: /map-"9q34" 
Insert length: 1113 bp 

Poly A stretch at pos. 1073, polyadenylation signal at pos . 1055 



1 GGTGGGCAAA GGCATCTTCC TCTGGGAAGG ACTGGCACAA GCACTTGGTC 
51 CCTGGGTTGT GTGCCTGGGA GGCCGGGATC AGGGCTGGCC CTCTTTCTCC 
101 CTGGCAAAGC AAAACCTCCC TTTTACTACT ATCAAGGGGA AGTAACTTGA 
151 AGGTGCCTGT GGCAGGCAGC ACCTTGAGCC AACAGGAACC ATTGACATGC 
201 GAGGCCCAGG GCAGGCAGAC TGTGCAGTGG CCATTGGGCG GCCCCTCGGG 
251 GAGGTGGTGA CCCTCCGCGT CCTTGAGAGT TCTCTCAACT GCAGTGCGGG 
301 GGACATGTTG CTGCTTTGGG GCCGGCTCAC CTGGAGGAAG ATGTGCAGGA 
351 AGCTGTTGGA CATGACTTTC AGCTCCAAGA CCAACACGCT GGTGGTGAGG 
401 CAGCGCTGCG GGCGGCCAGG AGGTGGGGTG CTGCTGCGGT ATGGGAGCCA 
4 51 GCTTGCTCCT GAAACCTTCT ACAGAGAATG TGACATGCAG CTCTTTGGGC 
501 CCTGGGGTGA AATCGTGAGC CCCTCGCTGA GTCCAGCCAC GAGTAATGCA 
551 GGGGGCTGCC GGCTCTTCAT TAATGTGGCT CCGCACGCAC GGATTGCCAT 
601 CCATGCCCTG GCCACCAACA TGGGCGCTGG GACCGAGGGA GCCAATGCCA 
651 GCTACATCTT GATCCGGGAC ACCCACAGCT TGAGGACCAC AGCGTTCCAT 
701 GGGCAGCAGG TGCTCTACTG GGAGTCAGAG AGCAGCCAGG CTGAGATGGA 
751 GTTCAGCGAG GGCTTCCTGA AGGCTCAGGC CAGCCTGCGG GGCCAGTACT 
801 GGACCCTCCA ATCATGGGTA CCGGAGATGC AGGACCCTCA GTCCTGGAAG 
851 GGAAAGGAAG GAACCTGAGG GTCATTGAAC ATTTGTTCCG TGTCTGGCCA 
901 GCCCTGGAGG GTTGACCCCT GGTCTCAGTG CTTTCCAATT CGAACTTTTT 
951 CCAATCTTAG GTATCTACTT TAGAGTCTTC TCCAATGTCC AAAAGGCTAG 
1001 GGGGTTGGAG GTGGGGACTC TGGAAAAGCA GCCCCCATTT CCTCGGGTAC 
1051 CAATAAATAA AACATGCAGG CTGAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1101 AAAAAAAAAA AAA 



BLAST Results 



Entry HSAC1644 from database EMBL: 

Genomic sequence from Human 9q34, complete sequence. 
Score = 2072, P = 8.8e-225, identities = 422/430 
5 exons Bp 41969-38232 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 197 bp to 8 65 bp; peptide length: 223 
Category: putative protein 



1 MRGPGQADCA VAIGRPLGEV VTLRVLESSL NCSAGDMLLL WGRLTWRKMC 
51 RKLLDMTFSS KTNTLVVRQR CGRPGGGVLL RYGSQLAPET FYRECDMQLF 
101 GPWGEIVSPS LSPATSNAGG CRLFINVAPH ARIA I HAL AT NMGAGTEGAN 
151 ASYILIRDTH SLRTTAFHGQ QVLYWESESS QAEMEFSEGF LKAQASLRGQ 
201 YWTLQSWVPE MQDPQSWKGK EGT 
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BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22c23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22c23, frame 2 



Report for DKFZphtes3_22c23 .2 



{LENGTH] 223 

[MW] 24546.19 

tpl] 8.57 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 6 

[PROSITE] A S N_GL YCOS Y LAT I ON 2 

[KW] Alpha_Beta 



SEQ MRGPGQADCAVAIGRPLGEVVTLRVLESSLNCSAGDMLLLWGRLTWRKMCRKLLDMTFSS 

PRD ccccccccceeeecccccceeeeehhhhhcccccchhhhhhchhhhhhhhhhhhhhhccc 

SEQ KTNTLVVRQRCGRPGGGVLLRYGSQLAPETFYRECDMQLFGPWGEIVSPSLSPATSNAGG 

PRD ccceeeeeecccccccceeeeccccccchhhhhhhhhccccccceeeecccccccccccc 

SEQ CRLFINVAPHARIAIHALATNMGAGTEGANASYILIRDTHSLRTTAFHGQQVLYWESESS 

PRD ceeeeeecccceeehhhhhhhhccccccccceeeeeecccccceeecccceeeeeccccc 

SEQ QAEMEFSEGFLKAQASLRGQYWTLQSWVPEMQDPQSWKGKEGT 

PRD hhhhhhhcchhhhhhhhhhcccccccccccccccccccccccc 



Prosite for DKFZphtes3_22c23 .2 



PS00001 


31->35 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


150->154 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


22->25 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


45->48 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


59->62 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


161->164 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


196->199 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


216->219 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


33->37 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


5->ll 


MYRISTYL 


PDOC00008 


PS00008 


145->151 


MYRISTYL 


PDOC00008 


PS00008 


148->154 


MYRISTYL 


PDOC00008 


PS00008 


199->205 


MYRISTYL 


PDOCO0008 



(No Pfam data available for DKFZphtes3_22c23 . 2) 
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DKFZphtes3_22g2 

group: nucleic acid management 

DKFZphtes3 22g2encodes a novel 1230 amino acid protein with nearly identical to rat TIP120. 

TATA-binding protein TBP is a central component for transcriptional regulation and is a target 
for various transcription regulators. TBP-interacting protein 120 (TIP120) is a protein 
interacting with the TATA-binding protein (TBP) . The novel protein is the human ortholog of 
rat TIP120. The novel TBP-binding protein is considered to participate in transcription 
regulation through the interaction with TBP. 

The new protein can find application in modulation of gene transcription. 

KIAA0829, complete cds, nearly identical to rat TIP120 
complete cDNA, complete cds, EST hits, 
Sequenced by LMU 

Locus: /map="387.3 cR from top of Chrl2 linkage group* 
Insert length: 5387 bp 

Poly A stretch at pos. 5352, polyadenylation signal at pos. 5335 

1 GGGAGCGAGT GCGGAGCGAG TGGGAGCGAG ACGGCCCTGA GTGGAAGTGT 

51 CTGGCTCCCC GTAGAGGCCC TTCTGTACGC CCCGCCGCCC ATGAGCTCGT 

101 TCTCACGCGA ACAGCGCCGT CGTTAGGCTG GCTCTGTAGC CTCGGCTTAC 

151 CCCGGGACAG GCCCACGCCT CGCCAGGGAG GGGGCAGCCC GTCGAGGCGC 

201 CTCCCTAGTC AGCGTCGGCG TCGCGCTGCG ACCCTGGAAG CGGGAGCCGC 

251 CGCGAGCGAG AGGAGGAGCT CCAGTGGCGG CGGCGGCGGC GGCAGCGGCA 

301 GCGGGCAGCA GCTCCAGCAG CGCCAGCAGG CGGGATCGAG GCCGTCAACA 

351 TGGCGAGCGC CTCGTACCAC ATTTCCAATT TGCTGGAAAA AATGACATCC 

401 AGCGACAAGG ACTTTAGGTT TATGGCTACA AATGATTTGA TGACGGAACT 

451 GCAGAAAGAT TCCATCAAGT TGGATGATGA TAGTGAAAGG AAAGTAGTGA 

501 AAATGATTTT GAAGTTATTG GAAGATAAAA ATGGAGAGGT ACAGAATTTA 

551 GCTGTCAAAT GTCTTGGTCC TTTAGTGAGT AAAGTGAAAG AATACCAAGT 

601 AGAGACAATT GTAGATACCC TCTGCACTAA CATGCTTTCT GATAAAGAAC 

651 AACTTCGAGA CATTTCAAGT ATTGGTCTTA AAACAGTAAT TGGAGAACTT 

701 CCTCCAGCTT CCAGTGGCTC TGCATTAGCT GCTAATGTAT GTAAAAAGAT 

751 TACTGGACGT CTTACAAGTG CAATAGCAAA ACAGGAAGAT GTCTCTGTTC 

801 AGCTAGAAGC CTTGGATATT ATGGCTGATA TGTTGAGCAG GCAAGGAGGA 

851 CTTCTTGTTA ATTTCCATCC TTCAATTCTG ACCTGTCTAC TTCCCCAGTT 

901 GACCAGCCCT AGACTTGCAG TGAGGAAAAG AACCATTATC GCTCTTGGCC 

951 ATCTGGTTAT GAGCTGTGGA AATATAGTTT TTGTAGATCT TATTGAACAT 

1001 CTGTTGTCAG AGTTGTCCAA AAATGATTCT ATGTCAACAA CAAGAACCTA 

1051 CATACAATGT ATTGCTGCTA TTAGTAGGCA AGCTGGTCAT AGAATAGGTG 

1101 AATACCTTGA GAAGATAATT CCTTTGGTGG TAAAATTTTG CAATGTAGAT 

1151 GATGATGAAT TAAGAGAGTA CTGTATTCAA GCCTTTGAAT CATTTGTAAG 

1201 AAGATGTCCT AAGGAAGTAT ATCCTCATGT TTCTACCATT ATAAATATTT 

1251 GTCTTAAATA TCTTACCTAT GATCCAAATT ATAATTACGA TGATGAAGAT 

1301 GAAGATGAAA ATGCAATGGA TGCTGATGGT GGTGATGATG ATGATCAAGG 

1351 GAGTGATGAT GAATACAGTG ATGATGATGA CATGAGTTGG AAAGTGAGAC 

1401 GTGCAGCTGC GAAGTGCTTG GATGCTGTAG TTAGCACAAG GCATGAAATG 

1451 CTTCCAGAAT TCTACAAGAC CGTCTCTCCT GCACTAATAT CCAGATTTAA 

1501 AGAGCGTGAA GAGAATGTAA AGGCAGATGT TTTTCACGCA TACCTTTCTC 

1551 TTTTGAAGCA AACTCGTCCT GTACAAAGTT GGCTATGTGA CCCTGATGCA 

1601 ATGGAGCAGG GAGAAACACC TTTAACAATG CTTCAGAGTC AGGTTCCCAA 

1651 CATTGTTAAA GCTCTTCACA AACAGATGAA AGAAAAAAGT GTGAAGACCC 

1701 GACAGTGTTG TTTTAACATG TTAACTGAGC TGGTAAATGT ATTACCTGGG 

1751 GCCCTAACTC AACACATTCC TGTACTTGTA CCAGGAATCA TTTTCTCACT 

1801 GAATGATAAA TCAAGCTCAT CGAATTTGAA GATCGATGCT TTGTCATGTC 

1851 TATACGTAAT CCTCTGTAAC CATTCTCCTC AAGTCTTCCA TCCTCACGTT 

1901 CAGGCTTTGG TTCCTCCAGT GGTGGCTTGT GTTGGAGACC CATTTTACAA 

1951 AATTACATCT GAAGCACTTC TTGTTACTCA ACAGCTTGTC AAAGTAATTC 

2001 GTCCTTTAGA TCAGCCTTCC TCGTTTGATG CAACTCCTTA TATCAAAGAT 

2051 CTATTTACCT GTACCATTAA GAGATTAAAA GCAGCTGACA TTGATCAGGA 

2101 AGTCAAGGAA AGGGCTATTT CCTGTATGGG ACAAATTATT TGCAACCTTG 

2151 GAGACAATTT GGGTTCTGAC TTGCCTAATA CACTTCAGAT TTTCTTGGAG 

2201 AGACTAAAGA ATGAAATTAC CAGGTTAACT ACAGTAAAGG CATTGACACT 

2251 GATTGCTGGG TCACCTTTGA AGATAGATTT GAGGCCTGTT CTGGGAGAAG 

2301 GGGTTCCTAT CCTTGCTTCA TTTCTTAGAA AAAACCAGAG AGCTTTGAAA 

2351 CTGGGTACTC TTTCTGCCCT TGATATTCTA ATAAAAAACT ATAGTGACAG 

2401 CTTGACAGCT GCCATGATTG ATGCAGTTCT AGATGAGCTC CCACCTCTTA 

2451 TCAGCGAAAG TGATATGCAT GTTTCACAAA TGGCCATCAG TTTTCTTACC 

2501 ACTTTGGCAA AAGTATATCC CTCCTCCCTT TCAAAGATAA GTGGATCCAT 

2551 TCTCAATGAA CTTATTGGAC TTGTGAGATC ACCCTTATTG CAGGGGGGAG 
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2601 CTCTTAGTGC CATGCTAGAC TTTTTCCAAG CTCTGGTTGT CACTGGAACA 
2651 AATAATTTAG GATACATGGA TTTGTTGCGC ATGCTGACTG GTCCAGTTTA 
2701 CTCTCAGAGC ACAGCTCTTA CTCATAAGCA GTCTTATTAT TCCATTGCCA 
2751 AATGTGTAGC TGCCCTTACT CGAGCATGCC CTAAAGAGGG ACCAGCTGTA 
2801 GTAGGTCAGT TTATTCAAGA TGTCAAGAAC TCAAGGTCTA CAGATTCCAT 
2851 TCGTCTCTTA GCTCTACTTT CTCTTGGAGA AGTTGGGCAT CATATTGACT 
2901 TAAGTGGACA GTTGGAACTA AAATCTGTAA TACTAGAAGC TTTCTCATCT 
2951 CCTAGTGAAG AAGTCAAATC AGCTGCATCC TATGCATTAG GCAGCATTAG 
3001 TGTGGGCAAC CTTCCTGAAT ATCTGCCGTT TGTCCTGCAA GAAATAACTA 
3051 GTCAACCCAA AAGGCAGTAT CTTTTACTTC ATTCCTTGAA GGAAATTATT 
3101 AGCTCTGCAT CAGTGGTGGG CCTTAAACCA TATGTTGAAA ACATCTGGGC 
3151 CTTATTACTA AAGCACTGTG AGTGTGCAGA GGAAGGAACC AGAAATGTTG 
3201 TTGCTGAATG TCTAGGAAAA CTCACTCTAA TTGATCCAGA AACTCTCCTT 
3251 CCACGGCTTA AGGGGTACTT GATATCAGGC TCATCATATG CCCGAAGCTC 
3301 AGTGGTTACG GCTGTGAAAT TTACAATTTC TGACCATCCA CAACCTATTG 
3351 ATCCACTGTT AAAGAACTGC ATAGGTGATT TCCTAAAAAC TTTGGAAGAC 
3401 CCAGATTTGA ATGTGAGAAG AGTAGCCTTG GTCACATTTA ATTCAGCAGC 
34 51 ACATAACAAG CCATCATTAA TAAGGGATCT ATTGGATACT GTTCTTCCAC 
3501 ATCTTTACAA TGAAACAAAA GTTAGAAAGG AGCTTATAAG AGAGGTAGAA 
3551 ATGGGTCCAT TTAAACATAC GGTTGATGAT GGTCTGGATA TTAGAAAGGC 
3601 AGCATTTGAG TGTATGTACA CACTTCTAGA CAGTTGTCTT GATAGACTTG 
3651 ATATCTTTGA ATTTCTAAAT CATGTTGAAG ATGGTTTGAA GGACCATTAT 
3701 GATATTAAGA TGCTGACATT TTTAATGTTG GTGAGACTGT CTACCCTTTG 
3751 TCCAAGTGCA GTACTGCAGA GGTTGGACCG ACTTGTTGAG CCATTACGTG 
3801 CAACATGTAC AACTAAGGTA AAGGCAAACT CAGTAAAGCA GGAGTTTGAA 
3851 AAACAAGATG AATTAAAGCG ATCTGCCATG AGAGCAGTAG CAGCACTGCT 
3901 AACCATTCCA GAAGCAGAGA AGAGTCCACT GATGAGTGAA TTCCAGTCAC 
3951 AGATCAGTTC TAACCCTGAG CTGGCGGCTA TCTTTGAAAG TATCCAGAAA 
4001 GATTCATCAT CTACTAACTT GGAATCAATG GACACTAGTT AGATGTTTGT 
4051 TCACCATGGG GACCATTACA TAT G AC CAT A CAATGCACTG AATTGACAGG 
4101 TTAATCATAA GACATGGAAA GAGAAGTGTC TAAAAGCTTC AAAATGTTCC 
4151 ACTTTTTTTT CCTTCATGGA GACTGTTTGT TTGGCTTTCT TCCATTGTTG 
4201 TTTTTGTAGC ATTTATTTCA GAAATGTGTA TTTCCATAAT CCAGAGGTTG 
42 51 TAAAACCACT AGTGTTTTAG TGGTTACAGC AACATTTGAA ATGGAAACTA 
4301 AAAGTTAGGA TTTTATGGAG TATGGAGATA GGGTCCAGTA TCTATTTACC 
4351 CTGTAATGTT TAGGATTAAA ATGTTAAAAT TTTGTGACCA TGAATTTCTT 
4 401 TCTTTTATAA ATTTTCTCAT TTAAAAATCA AAAATCTTGC AAAACAAAAA 
4451 CCATGTTTCT TTTTCTTGTA TAACTTTTTG TTTTCAGCAA CATAAATTGA 
4501 TTTTTAGCTG GCAGACAAGA ATATCCATAT AAGATTTGTT AACCATTTCA 
4551 GAGAGTTTGG CAATTTTTAA AAGATAATAA GGTATCATTT TTAAGTATGA 
4601 AAATTAACAA TATCCCTGTT GCGCACACTA ATTTTGCATG AGTAAGTTTA 
4651 CAAATATGTA TCGTCTGTAA AGCAGCATGT GCAGATTATT CATAATATAG 
4701 AAGTTAAAAT AAGTATTAGT GCAATTTTCA GATATTTATT TTTGCACAGA 
4751 AAACACATTA TCTGGAGAGA AAGAAAGGAG AATTTTTGAG ACTTGGGTTT 
4801 TCTTAATGCC AGTGTGAATT TGCAGATGTT TTCAGAAAAT CAAGTCACAG 
4851 TAACAATTTG CCACTTTTTT CTATTATAAA TCTTCTTACT TAAATTTTGA 
4901 ATATTTAGTT TTTCTCAGTT ACCCATTTGT GTGTGTGTGA TTCCACTTAG 
4951 AAATTCTTAA AACCAGATTT TTCTTTCATT CCGTTTGGAT GTCTACATTC 
5001 CTTATCAAAG GATATAAATA CTGTGTATGC TTTTGAATTT TATTTTTAGG 
5051 AAAATTCTGA AGCCAGCTAT CACAGGTTTG TTAGCTAATA ATAGTATTTT 
5101 CTTTTAGTTG AGTTAGGTTT TTCCCCATCT CCTGTAGAGC GAATTTACAT 
5151 ATTGTATTGG GTAAGTGTTC ACTACTTTTC CTGATTAAGG GATCTGTGCT 
5201 GGGGAACAAA GCTTTTGCAG TACCTTATAT TGTAGTTAAA ATTTTATTTA 
5251 ACATATCCTT CAGTGAGCTC ATTTCACACT GTAGCCTCTT CCTTAAAATT 
5301 TGTGGTGCTC CTGTAACAGT AAGAACTAAT TCTGAAATAA AAGACATCTC 
5351 CTAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



97127450: 

Molecular cloning of a novel 120-kDa TBP-interacting 
protein. 



BLAST Results 



Entry I 
human : 
Score 



HS793345 from database EMBL: 
STS WI-12457. 

> = 1985, P = 1.3e-83, identities * 433/460 



Medline entries 



Peptide information for frame 2 



ORF from 350 bp to 4039 bp; peptide length: 1230 



724 



WO 01/12659 



PCT/IBOO/01496 



Category: known protein 
Classification: Nucleic acid management 



1 MASASYHISN LLEKMTSSDK DFRFMATNDL MTELQKDSIK LDDDSERKVV 
51 KMILKLLEDK NGEVQNLAVK CLGPLVSKVK EYQVETIVDT LCTNMLSDKE 
101 QLRDISSIGL KTVIGELPPA SSGSALAANV CKKITGRLTS AIAKQEDVSV 
151 QLEALDIMAD MLSRQGGLLV NFHPSILTCL LPQLTSPRLA VRKRTIIALG 
201 HLVMSCGNIV FVDLIEHLLS ELSKNDSMST TRTYIQCIAA ISRQAGHRIG 
251 EYLEKI IPLV VKFCNVDDDE LREYCIQAFE SFVRRCPKEV YPHVSTIINI 
301 CLKYLTYDPN YNYDDEDEDE NAMDADGGDD DDQGSDDEYS DDDDMSWKVR 
351 RAAAKCLDAV VSTRHEMLPE FYKTVSPALI SRFKEREENV KADVFHAYLS 
401 LLKQTRPVQS WLCDPDAMEQ GETPLTMLQS QVPNIVKALH KQMKEKSVKT 
451 RQCCFNMLTE LVNVLPGALT QHIPVLVPGI IFSLNDKSSS SNLKIDALSC 
501 LYVILCNHSP QVFHPHVQAL VPPVVACVGD PFYKITSEAL LVTQQLVKVI 
551 RPLDQPSSFD ATPYIKDLFT CTIKRLKAAD IDQEVKERAI SCMGQIICNL 
601 GDNLGSDLPN TLQIFLERLK NEITRLTTVK ALTLIAGSPL KIDLRPVLGE 
651 GVPILASFLR KNQRALKLGT LSALDILIKN YSDSLTAAMI DAVLDELPPL 
701 ISESDMHVSQ MAISFLTTLA KVYPSSLSKI SGSILNELIG LVRSPLLQGG 
751 ALSAMLDFFQ ALVVTGTNNL GYMDLLRMLT GPVYSQSTAL THKQSYYSIA 
801 KCVAALTRAC PKEGPAVVGQ FIQDVKNSRS TDSIRLLALL SLGEVGHHID 
851 LSGQLELKSV ILEAFSSPSE EVKSAASYAL GSISVGNLPE YLPFVLQEIT 
901 SQPKRQYLLL HSLKEIISSA SVVGLKPYVE NIWALLLKHC ECAEEGTRNV 
951 VAECLGKLTL IDPETLLPRL KGYLISGSSY ARSSVVTAVK FTISDHPQPI 
1001 DPLLKNCIGD FLKTLEDPDL NVRRVALVTF NSAAHNKPSL IRDLLDTVLP 
1051 HLYNETKVRK ELIREVEMGP FKHTVDDGLD IRKAAFECMY TLLDSCLDRL 
1101 DIFEFLNHVE DGLKDHYDIK MLTFLMLVRL STLCPSAVLQ RLDRLVEPLR 
1151 ATCTTKVKAN SVKQEFEKQD ELKRSAMRAV AALLTIPEAE KSPLMSEFQS 
1201 QISSNPELAA IFESIQKDSS STNLESMDTS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22g2, frame 2 

TREMBL:AB020636_1 gene: M KIAA0829"; product: "KIAA0829 protein"; Homo 
sapiens mRNA for KIAA0829 protein, partial cds., N = 1, Score - 5986, P 
= 0 

TREMBL : RND 67 1 1_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus 
mRNA for TIP120, complete cds . , N = 1, Score = 6203, P = 0 



>TREMBL : RND671 1_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA 
for TIP120, complete cds. 
Length = 1,230 

HSPs: 

Score = 6203 (930.7 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 1227/1230 (99%), Positives - 1228/1230 (99%) 

Query: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 60 

MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 
Sbjct: 1 MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 60 

Query: 61 NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 120 

NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 
Sbjct: 61 NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 120 

Query: 121 SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 180 

SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 
Sbjct: 121 SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 180 

Query: 181 LPQLTSPRLAVRKRTI IALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 240 

LPQLTSPRLAVRKRTI IALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 
Sbjct: 181 LPQLTSPRLAVRKRTI I ALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 240 

Query: 241 ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 300 

ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTI INI 
Sbjct: 241 ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTI INI 300 

Query: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 360 

CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 
Sbjct: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 360 

Query: 361 VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 420 

VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 
Sbjct: 361 VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 420 



725 



WO 01/12659 



PCT/IB00/01496 



Query: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHI PVLVPGI 480 

GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 
SbjCt: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHI PVLVPGI 480 

Query: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

I FSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 
Sbjct: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

Query: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 600 

LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 
SbjCt: 541 LVTQQLVKVI RPLDQPSS FDATP Y I KDLFTCTI KRLKAADI DQEVKERAI SCMGQI ICNL 600 

Query: 601 GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

GDNLG DL NTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 
Sbjct: 601 GDNLGPDLSNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

Query: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 720 

KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 
Sbjct: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 720 

Query: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 
Sbjct: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

Query: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 
Sbjct: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

Query: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
Sbjct: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

Query: 901 SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 960 

SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 
Sbjct: 901 SQPKRQYLLLHSLKEIISSASWGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 960 

Query: 961 IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 
Sbjct: 961 IDPETLLPRLKGYLISGSSYARSSWTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

Query: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

NVRRVALVTFNSAAHNKPSLIRDLLD+VLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 
Sbjct: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDSVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

Query: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 
Sbjct: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

Query: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTI PEAEKSPLMSEFQS 
SbjCt: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTI PEAEKSPLMSEFQS 1200 

Query: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 

QISSNPELAAI FESIQKDSSSTNLESMDTS 
Sbjct: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 



Pedant information for DKFZphtes3_22g2, frame 2 



Report for DKFZphtes3_22g2 . 2 



[LENGTH] 1230 

[MW] 136376.58 

[plj 5.52 

[HOMOL] TREMBL : RND67 1 1_1 gene: "tipl20 rt ; product: "TIP120"; Rattus norvegicus mRNA for 

TIP120, complete cds. 0.0 

[ KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 5.28 % 



SEQ MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 

SEG 

PRD cccccchhhhhhhhhcccccceeeeehhhhhhhhhcccccccccchhhhhhhhhhhhhcc 

MEM 

SEQ NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 

SEG xxxx 

PRD ccccceeeeeeeeceeeeehhhhhhhhhhhhccchhhhhcccccccchhhhhhhhhcccc 
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MEM 

SEQ SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 

SEG xxxxxxxx 

PRD cccccchhhhhhhccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeeecchhhhhh 

MEM 

SEQ LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 

SEG 

PRD hcccccchhhhhhhhhhhheeeeecccceeehhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 

SEG 

PRD hhhhcccccccchhhhhhhhheeeeccchhhhhhhhhhhhhhhhccccceeecchhhhhh 

MEM 

SEQ CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD hhhhhccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 

SEG 

PRD hhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeecccccccc 

MEM 

SEQ GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhccccccccceeeecce 

MEM 

SEQ IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQAtVPPVVACVGDPFYKITSEAL 

SEG xxxxxxxxxxxxxxxx 

PRD eeeeccccccccchhhhhhhheeeeecccccccccceeeeecceeeeecccchhhhhhhh 

MEM 

SEQ LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQI ICNL 

SEG 

PRD hhhhhhhhhhcccccccccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhheeeecc 

MEM 

SEQ GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 

SEG 

PRD cccccccccchhhhhhhhhcchhhhhhhhhhhheeeeccccccccceeehhhhhhhhhhh 

MEM 

SEQ KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 

SEG 

PRD cccccceeecchhhhhhhhhhhccccccchhhhhhhhhhhheeeecccccchhhhhhhhc 

MEM 

SEQ GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhh 

MEM 

SEQ SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 

SEG 

PRD hccccccccccccccccceeeeeeccccchhhhhhhhhhhccccccccccchhhhhhhhh 

MEM 

SEQ SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 

SEG 

PRD cccchhhhhhhhhhhhhhcccceeehhhhhhhhhhhhhhhhcccccceeeeecccccccc 

MEM 

SEQ IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 

SEG " 

PRD cccccccccccccccccccccchhhhhhhhhhhccccccccccchhhhhhhhhhhccccc 

MEM 

SEQ NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 

SEG 

PRD ccceeeeeeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccch 

MEM 
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SEQ IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 

SEG 

PRO hhhhhhhhhhhhhhhccccccceeeecccccccccchhhhhhhhhhhhhhhhcccchhhh 

MEM 

SEQ RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 

SEG 

PRD hhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhh 

MEM 

SEQ QISSNPELAAIFESIQKDSSSTNLESMDTS 

SEG 

PRD hhhccchhhhhhhhhhhccccccccccccc 

MEM 

(No Prosite data available for DKFZphtes3_22g2 . 2 ) 
(No Pfam data available for DKFZphtes3_22g2 . 2) 
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DKFZphtes3_22nl3 
group: testes derived 

DKFZphtes3_22nl3 encodes a novel 677 amino acid protein without similarity to known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

dJl042K10.3, complete 
Sequenced by LMU 
Locus: /map«="22ql3. 1-13.2" 
Insert length: 3353 bp 

Poly A stretch at pos . 3315, polyadenylation signal at pos . 3298 

1 ATGGAACCAC TATCCCCACT GCCAAGTCCA CCCCCACACT CATTAAGCAA 

51 AGCCAACCCA AGTCTGCCAG TGAGAAGTCA CAGCGCAGCA AGAAGGCCAA 

101 GGAGCTGAAG CCAAAGGTGA AGAAGCTCAA GTACCACCAG TACATCCCCC 

151 CGGACCAGAA GCAGGACAGG GGGGCACCCC CCATGGACTC ATCCTACGCC 

201 AAGATCCTGC AGCAGCAGCA GCTCTTCCTC CAGCTGCAGA TCCTCAACCA 

251 GCAGCAGCAG CAGCACCACA ACTACCAGGC CATCCTGCCT GCCCCGCCAA 

301 AGTCAGCAGG CGAGGCCCTG GGAAGCAGCG GGACCCCCCC AGTACGCAGC 

351 CTCTCCACTA CCAATAGCAG CTCCAGCTCG GGCGCCCCTG GGCCCTGTGG 

401 GCTGGCACGT CAGAACAGCA CCTCACTGAC TGGCAAGCCG GGAGCCCTGC 

451 CGGCCAACCT GGACGACATG AAGGTGGCAG AGCTGAAGCA GGAGCTGAAG 

501 TTGCGATCAC TGCCTGTCTC GGGCACCAAA ACTGAGCTGA TTGAGCGCCT 

551 TCGAGCCTAT CAAGACCAAA TCAGCCCTGT GCCAGGAGCC CCCAAGGCCC 

601 CTGCCGCCAC CTCTATCCTG CACAAGGCTG GCGAGGTGGT GGTAGCCTTC 

651 CCAGCGGCCC GGCTGAGCAC GGGGCCAGCC CTGGTGGCAG CAGGCCTGGC 

701 TCCAGCTGAG GTGGTGGTGG CCACGGTGGC CAGCAGTGGG GTGGTGAAGT 

751 TTGGCAGCAC GGGCTCCACG CCCCCCGTGT CTCCCACCCC CTCGGAGCGC 

801 TCACTGCTCA GCACGGGCGA TGAAAACTCC ACCCCCGGGG ACACCTTTGG 

851 TGAGATGGTG ACATCACCTC TGACGCAGCT GACCCTGCAG GCCTCGCCAC 

901 TGCAGATCCT CGTGAAGGAG GAGGGCCCCC GGGCCGGGTC CTGTTGCCTG 

951 AGCCCTGGGG GGCGGGCGGA GCTAGAGGGG CGCGACAAGG ACCAGATGCT 

1001 GCAGGAGAAA GACAAGCAGA TCGAGGCGCT GACGCGCATG CTCCGGCAGA 

1051 AGCAGCAGCT GGTGGAGCGG CTCAAGCTGC AGCTGGAGCA GGAGAAGCGA 

1101 GCCCAGCAGC CCGCCCCCGC CCCCGCCCCC CTCGGCACCC CCGTGAAGCA 

1151 GGAGAACAGC TTCTCCAGCT GCCAGCTGAG CCAGCAGCCC CTGGGCCCCG 

1201 CTCACCCATT CAACCCCAGC CTGGCGGCCC CAGCCACCAA CCACATAGAC 

1251 CCTTGTGCTG TGGCCCCAGG GCCCCCGTCC GTGGTGGTGA AGCAGGAAGC 

1301 CTTGCAGCCT GAGCCCGAGC CGGTCCCCGC CCCCCAGTTG CTTCTGGGGC 

1351 CTCAGGGCCC CGGCCTCATC AAGGGGGTTG CACCTCCCAC CCTCATCACC 

1401 GACTCCACAG GGACCCACCT TGTCCTCACC GTGACCAATA AGAATGCAGA 

1451 CAGCCCTGGC CTGTCCAGTG GGAGCCCCCA GCAGCCCTCG TCCCAGCCTG 

1501 GCTCTCCAGC GCCTGCCCCC TCTGCCCAGA TGGACCTGGA GCACCCACTG 

1551 CAGCCCCTCT TTGGGACCCC CACTTCTCTG CTGAAGAAGG AACCACCTGG 

1601 CTATGAGGAA GCCATGAGCC AGCAGCCCAA ACAGCAGGAA AATGGTTCCT 

1651 CAAGCCAGCA GATGGACGAC CTGTTTGACA TTCTCATTCA GAGCGGAGAA 

1701 ATTTCAGCAG ATTTCAAGGA GCCGCCATCC CTGCCAGGGA AGGAGAAGCC 

1751 ATCCCCGAAG ACAGTCTGTG GGTCCCCCCT GGCAGCACAG CCATCACCTT 

1801 CTGCTGAGCT CCCCCAGGCT GCCCCACCTC CTCCAGGCTC ACCCTCCCTC 

1851 CCTGGACGCC TGGAGGACTT CCTGGAGAGC AGCACGGGGC TGCCCCTGCT 

1901 GACCAGTGGG CATGACGGGC CAGAGCCCCT TTCCCTCATT GACGACCTCC 

1951 ATAGCCAGAT GCTGAGCAGC ACTGCCATCC TGGACCACCC CCCGTCACCC 

2001 ATGGACACCT CGGAATTGCA CTTTGTTCCT GAGCCCAGCA GCACCATGGG 

2051 CCTGGACCTG GCTGATGGCC ACCTGGACAG CATGGACTGG CTGGAGCTGT 

2101 CGTCAGGTGG TCCCGTGCTG AGCCTAGCCC CCCTCAGCAC CACAGCCCCC 

2151 AGCCTCTTCT CCACAGACTT CCTCGATGGC CATGATTTGC AGCTGCACTG 

2201 GGATTCCTGC TTGTAGCTCT CTGGCTCAAG ACGGGGTGGG GAAGGGGCTG 

2251 GGAGCCAGGG TACTCCAATG CGTGGCTCTC CTGCGTGATT CGGCCTCTCC 

2301 ACATGGTTGT GAGTCTTGAC AATCACAGCC CCTGCTTTTT CCCTTCCCTG 

2351 GGAGGCTAGA ACAGAGAAGC CCTTACTCCT GGTTCAGTGC CACGCAGGGC 

2401 AGAGGAGAGC AGCTGTCAAG AAGCAGCCCT GGCTCTCACG CTGGGGTTTT 

24 51 GGACACACGG TCAGGGTCAG GGCCATTTCA GCTTGACCTC CTTTTTTGAG 

2501 GTCAGGGGGC ACTGTCTGTC TGGCTACAAT TTGGCTAAGG TAGGTGAAGC 

2551 CTGGCCAGGC GGGAGGCTTC TCTTCTGACC CAGGGCTGAG ACAGGTTAAG 

2601 GGGTGAATCT CCTTCCTTTC TCTCCCTGCT TTGCTGTGAA GGGAGAAATT 

2651 AGCCTGGGCC TCTACCCCCT ATTCCCTGTG TCTGCCAACC CCAGGATCCC 

2701 AGGGCTCCCT GCCATTTTAG TGTCTTGGTG TAGTGTAACC ATTTAGTGGT 

2751 TGGTGGCAAC AATTTTATGT ACAGGTGTAT ATACCTCTAT ATTATATATC 

2801 GACATACATA TATATTTTTG GGGGGGGGCG GACAGGAGAT GGGTGCAACT 
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2851 CCCTCCCATC CTACTCTCAC AGAAGGGCCT GGATGCAAGG TTACCCTTGA 

2901 GCTGTGTGCC ACAGTCTGGT GCCCAGTCTG GCATGCAGCT ACCCAGGCCC 

2951 ACCCATCACG TGTGATTGAC ATGTAGGTAC CCTGCCACGG CCTATGCCCC 

3001 ACCTGCCCTG CTTCCTGGCT CCTTATCAGT GCCATGAGGG CAGAGGTGCT 

3051 ACCTGGCCTT CCTGCCAGGA GCTCTCCACC CACTCACATT CCGTCCCCGC 

3101 CGCCTCACTG CAGCCAGCGT GGCCCTAGGA CAGGAGGAGC TTCGGGCCCA 

3151 GCTTCACCCT GCGGTGGGGC TGAGGGGTGG CCATCTCCTG CCCTGGGGCC 

3201 ACTGGCTTCA CATTCTGGGC TGACTCATAG GGGAGTAGGG GTGGAGTCAC 

3251 CAAAACCAGT GCTGGGACAA AGATGGGGAA GGTGTGTGAA CTTTTTAAAA 

3301 TAAACACAAA AACACAGGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AAG 



BLAST Results 



Entry HS1042K10 from database EMBL: 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . 
Contains the ADSL gene for Adenylosuccinate lyase (EC 4.3.2.2, 
Adenylosuccinase, ASL) and 4 novel genes tone with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a 
putative CpG island. 

Score - 7997, p = 0.0e+00, identities = 1617/1645 
7 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 183 bp to 2213 bp; peptide length; 677 
Category: similarity to unknown protein 
Classification: unclassified 



1 MDSSYAKILQ QQQLFLQLQI LNQQQQQHHN YQAILPAPPK SAGEALGSSG 
51 TPPVRSLSTT NSSSSSGAPG PCGLARQNST SLTGKPGALP ANLDDMKVAE 
101 LKQELKLRSL PVSGTKTELI ERLRAYQDQI SPVPGAPKAP AATSILHKAG 
151 EVVVAFPAAR LSTGPALVAA GLAPAEVWA TVASSGVVKF GSTGSTPPVS 
201 PTPSERSLLS TGDENSTPGD TFGEMVTSPL TQLTLQASPL QILVKEEGPR 
251 AGSCCLSPGG RAELEGRDKD QMLQEKDKQI EALTRMLRQK QQLVERLKLQ 
301 LEQEKRAQQP APAPAPLGTP VKQENSFSSC QLSQQPLGPA HPFNPSLAAP 
351 ATNHIDPCAV APGPPSVVVK QEALQPEPEP VPAPQLLLGP QGPGLIKGVA 
401 PPTLITDSTG THLVLTVTNK NADSPGLSSG SPQQPSSQPG SPAPAPSAQM 
451 DLEHPLQPLF GTPTSLLKKE PPGYEEAMSQ QPKQQENGSS SQQMDDLFDI 
501 LIQSGEISAD FKEPPSLPGK EKPSPKTVCG SPLAAQPSPS AELPQAAPPP 
551 PGSPSLPGRL EDFLESSTGL PLLTSGHDGP EPLSLIDDLH SQMLSSTAIL 
601 DHPPSPMDTS ELHFVPEPSS TMGLDLADGH LDSMDWLELS SGGPVLSLAP 
651 LSTTAPSLFS TDFLDGHDLQ LHWDSCL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22nl3, frame 3 

TREMBL:HS1042K10_6 gene: "dJ1042K10 . 3"; product: "dJ1042K10 . 3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3 . 1-13.2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 
4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable 
rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs 
and a putative CpG island., N - 1, Score = 1285, P «= 4.9e-131 

TREMBL : CEUK0 6A9_3 gene: "K06A9.1a"; Caenorhabditis elegans cosmid 
K06A9., N = 2, Score = 149, P = 1.3e-09 

TREMBLNEW: SSI 1 3282 8_1 product: "p210 protein"; Spermatozopsis similis 

mRNA for p210 protein, partial, N = 1, Score - 171, P « 2.8e-09 



>TREMBL:HS1042K10_6 gene: "dJ1042Kl0 . 3" ; product: "dJ1042K10.3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3. 1-13.2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 
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4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a 
putative CpG island. 
Length = 243 

HSPs: 

Score = 1285 (192.8 bits), Expect - 4.9e-131, P = 4.9e-l31 
Identities = 243/243 (100%), Positives = 243/243 (100%) 

Query: 435 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 494 

PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 
SbjCt: 1 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 60 

Query: 495 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 554 

DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 
SbjCt: 61 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 120 

Query: 555 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 614 

SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 
Sbjct: 121 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 180 

Query: 615 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 674 

VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 
SbjCt: 181 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 240 

Query: 675 SCL 677 
SCL 

Sbjct: 241 SCL 243 



Pedant information for DKFZphtes3_22nl3 / frame 3 



Report for DKFZphtes3_22nl3 . 3 



[LENGTH] 677 

[MW] 70743.01 

[pi] 4.93 

[HOMOL] TREMBL:HS1042K10_6 gene: "dJl042Kl0 . 3"; product: "dJl042K10 . 3 (novel protein)"; 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . Contains the ADSL gene for 
Adenylosuccinate lyase (EC 4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with 
probable rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a putative 
CpG island, le-111 



[KW] TRANSMEMBRANE 1 

[KW] LOW_C0MPLEXITY 21.57 % 

[KW] COILED_COIL 4 , 58 % 

SEQ MDSSYAKILQQQQLFLQLQILNQQQQQHHNYQAILPAPPKSAGEALGSSGTPPVRSLSTT 

SEG xxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhcceeeeeecccccceeeecccccccceeecccc 

COILS 

MEM 

SEQ NSSSSSGAPGPCGLARQNSTSLTGKPGALPANLDDMKVAELKQELKLRSLPVSGTKTELI 

SEG xxxxxx 

PRD cccccccccccceeecccccccccccccccccccchhhhhhhhhhhhhhcccccchhhhh 

COILS 

MEM 

SEQ ERLRAYQDQISPVPGAPKAPAATSILHKAGEVVVAFPAARLSTGPALVAAGLAPAEVVvA 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcccccccccccceeeeeeeccceeeeccccccccccccccccccceeeeee 

COILS 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ TVASSGVVKFGSTGSTPPVSPTPSERSLLSTGDENSTPGDTFGEMVTSPLTQLTLQASPL 

SEG xxxxxxxx. . xxxxxxxxxxxxxx 

PRD eeecccccccccccccccccccccceeeeccccccccccccccceeecccceeeecccce 

COILS 

MEM M 

SEQ QILVKEEGPRAGSCCLSPGGRAELEGRDKDQMLQEKDKQIEALTRMLRQKQQLVERLKLQ 

SEG 

PRD eeeeeccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LEQEKRAQQPAPAPAPLGTPVKQENSFSSCQLSQQPLGPAHPFNPSLAAPATNHIDPCAV 
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SEG xxxxxxxxxx 

PRD hhhhhhhhhcccccccccccccccccceeeeecccccccccccccceeeccccccccccc 

COILS CCCCCCC 

MEM 

SEQ APGPPSVVVKQEALQPEPEPVPAPQLLLGPQGPGLIKGVAPPTLITDSTGTHLVLTVTNK 

SEG xxxxxxxxxxxx 

PRD cccccceeeeeccccccccccccceeeccccccceeeeecccccccccccceeeeeeecc 

COILS 

MEM 

SEQ NADSPGLSSGSPQQPSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGVEEAMSQ 

SEG . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccccccccccc 

COILS 

MEM 

SEQ QPKQQENGSSSQQMDDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPS 

SEG xxxxxxxxxxx 

PRD ccccccccccccchhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ AELPQAAPPPPGSPSLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAIL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhcccceee 

COILS 

MEM 

SEQ DHPPSPMDTSELHFVPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFS 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeeeccccceeeeeecccccccccc 

COILS 

MEM 

SEQ TDFLDGHDLQLHWDSCL 

SEG 

PRD cccccccceeecccccc 

COILS 

MEM 

(No Prosite data available for DKFZphtes3_22nl3 . 3) 
(No Pfam data available for DKFZphtes3_22nl3 . 3 ) 
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DKFZphtes3_23111 



group: intracellular transport and trafficking 

DKFZphtes3_23111 encodes a novel 186 amino acid protein nearly identical to mouse ADP- 
ribosylation-like factor homolog 6 (Arl6) - 

Protein secretion through the endoplasmic reticulum and the Golgi vesicular trafficking system 
is initiated by the binding of ADP-ribosylation factors 

(ARFs) to donor membranes, leading to recruitment of cocatomer, bud formation, and eventual 
vesicle release. ARFs are approximately 20-kDa GTPases that are active with bound GTP and 
inactive with GDP bound. The novel protein contains an ATP/GTP-binding site motif A (P-loop) 
and seems to be a novel ARF. It seems to have an important role in vesicular transport and 
vesicular trafficking. 

The new protein can find application in modulating vesicle transport and trafficking in cells. 



nearly identical to mouse Arl6, ADP-ribosylation-like factor homolog 
start at Bp 15 matches kozak consensus ANNatgG 
Sequenced by LMU 
Locus: unknown 
Insert length: 717 bp 

Poly A stretch at pos . 689, no polyadenylation signal found 



1 ATTTGAATCA CATTATGGGA TTGCTAGACA GACTTTCAGT CTTGCTTGGC 

51 CTGAAGAAGA AGGAGGTTCA TGTTTTGTGC CTTGGGCTAG ATAATAGTGG 

101 CAAAACGACG ATCATTAACA AACTTAAACC TTCAAATGCT CAATCTCAAA 

151 ATATCCTTCC AACAATAGGA TTCAGCATAG AGAAATTCAA ATCATCCAGT 

201 TTGTCATTTA CAGTGTTTGA CATGTCAGGT CAAGGAAGAT ACAGAAATCT 

251 CTGGGAACAC TATTATAAAG AAGGCCAAGC TATTATTTTT GTCATTGATA 

301 GTAGTGATAG ATTAAGAATG GTTGTGGCCA AAGAAGAACT CGATACTCTT 

351 CTGAATCATC CAGATATTAA ACACCGTCGA ATTCCAATCT TATTCTTTGC 

401 AAATAAAATG GATCTTAGAG ATGCAGTGAC ATCTGTAAAA GTGTCTCAGT 

451 TGCTGTGTTT AGAGAACATC AAAGATAAAC CCTGGCATAT TTGTGCTAGT 

501 GATGCCATAA AAGGAGAAGG CTTGCAAGAA GGTGTAGACT GGCTTCAAGA 

551 TCAGATCCAG ACTGTGAAGA CATGAAAAGA TAATAGTTGG AAACCTCAGC 

601 AATTTTCAAT TCAAGGAATC TATCTAAGAC AAATAGAATA CATTTTGTAA 

651 AAGATGTTTA TGCATCAAAA AATATAATTT TCTGCTTGCA AAAAAAAAAA 
701 AAAAAAAAAA AAAAAAG 



BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 15 bp to 572 bp; peptide length: 186 
Category: strong similarity to known protein 
Classification: intacellular transport and traffic ' 
Prosite motifs: ATP GTP A (24-32) 



1 MGLLDRLSVL LGLKKKEVHV LCLGLDNSGK TTIINKLKPS NAQSQNILPT 

51 IGFSIEKFKS SSLSFTVFDM SGQGRYRNLW EHYYKEGQAI IFVIDSSDRL 

101 RMVVAKEELD TLLNHPDIKH RRIPILFFAN KMDLRDAVTS VKVSQLLCLE 

151 NIKDKPWHIC ASDAIKGEGL QEGVDWLQDQ IQTVKT 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_23111, frame 3 

TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 
homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds . , N e 1, Score = 923, P - l.le-92 

TREMBL:CEC38D4_5 gene: "C38D4.8"; Caenorhabditis elegans cosmid C38D4, 
N « 1, Score - 418, P » 3.6e-39 

PIR:S66337 ADP-ribosylation factor 1 - Chlamydomonas reinhardtii, N =* 
1, Score - 373, P = 2. le-34 

SWISSPROT:ARFl_CHLRE ADP-RIBOSYLATION FACTOR 1., N ~ 1, Score = 372, P 
= 2.7e-34 



>TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 

homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds. 
Length - 186 

HSPs: 



Score » 923 (138.5 bits), Expect = l.le-92, P = l.le-92 
Identities - 178/186 (95%), Positives - 184/186 (98%) 



Query: 


1 


MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 


60 




MGLLDRLS LLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQ+I+PTIGFSIEKFKS 




Sbjct: 


1 


MGLLDRLSGLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQDIVPTIGFSIEKFKS 


60 


Query: 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 


120 




SSLSFTVFDMSGQGRYRNLWEHYYK+GQAIIFVIDSSD+LRMVVAKEELDTLLNHPDIKH 




Sbjct: 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKDGQAIIFVIDSSDKLRMVVAKEELDTLLNHPDIKH 


120 


Query: 


121 


RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


180 




RRIPILFFANKMDLRD+VTSVKVSQLLCLE+IKDKPWHICASDAIKGEGLQEGVDWLQDQ 




Sbjct: 


121 


RRIPILFFANKMDLRDSVTSVKVSQLLCLESIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


180 


Query: 


181 


IQTVKT 186 








IQ VKT 




Sbjct: 


181 


IQAVKT 186 





Pedant information for DKFZphtes3_23111, frame 3 



Report for DKFZphtes3_23111 . 3 



[ LENGTH ] 186 

[MW] 21097.69 

[pi] 8.72 

[HOMOL] TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor homolog 

ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 (Arl6) mRNA, complete cds. 4e-94 



[ FUNCAT 1 
( FUNCAT ] 
[ FUNCAT J 
le-36 
( FUNCAT] 
YDL137w] 2e-36 
t FUNCAT] 
palmitylation, 
( FUNCAT } 
[ FUNCAT ] 
( FUNCAT] 
( FUNCAT ] 
[FUNCAT] 
[FUNCAT] 

[S 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-04 
[FUNCAT] 
4e-04 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 



30.08 organization of golgi [S. cerevisiae, YDLl92w] le-36 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w] le-36 

08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YDLl92w] 



30.09 organization of intracellular transport vesicles 



(S. cerevisiae, 



06.07 protein modification (glycolsylation, acylation, myristylation, 



farnesylation and processing) [S. cerevisiae 

30.03 organization of cytoplasm [S. cerevisiae 
03.22 cell cycle control and mitosis [S. cerevisiae 

30.04 organization of cytoskeleton [S. cerevisiae 
r general function prediction [M. jannaschii 
30.02 organization of plasma membrane [S 
03.07 pheromone response, mating-type determination 

cerevisiae, YHROOSc] 4e-05 

10.05.07 g-proteins [S. cerevisiae, YHROOSc] 4e-05 
08.13 vacuolar transport [S. cerevisiae, YKR014c] 2e-04 

08.19 cellular import [S. cerevisiae, YKR014c] 2e-04 

[S 



YBR164c] 2e-32 
YBR164C] 2e-32 
YMRl38w] 4e-19 
YMRl38w] 4e-19 
MJ1339] 2e-05 
cerevisiae, YHROOSc] 4e-05 
sex-specific proteins 



06.04 protein targeting, sorting and translocation 
03.04 budding, cell polarity and filament formation 
BL01288C 

BL01020C SARI family proteins 

BL01019C ADP-ribosylation factors family proteins 



cerevisiae, YKR014c] 
[S. cerevisiae, YFLOOSw] 
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t BLOCKS) BL01019B ADP-ribosylation factors family proteins 

(BLOCKS) BL01019A ADP-ribosylation factors family proteins 

(SCOP) dlas3_2 3.29.1,4,12 Transducin (alpha subunit), insertion domai 2e-45 

(SCOP] dlmhl 3.29.1,4.2 Racl [Human (Homo sapiens) 2e-46 

(SCOP] d5p21 3.29.1.4.1 cH-p21 Ras protein [human (Homo sapiens) 5e-37 

(SCOP] dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARF1) [human (Horn 4e-61 

[SCOP] dla2kc_ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) [Do 4e-33 

[PIRKW] glycoprotein 2e-33 

[PIRKW] monomer 3e-31 

[PIRKW] P-loop 2e-35 

[PIRKW) lipoprotein 2e-33 

[PIRKW] GTP binding 2e-35 

[SUPFAM] ADP-ribosylation factor 2e-35 

[PROSITE] ATP_GTP_A 1 

( PFAM] ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 5.91 % 

SEQ MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 

SEG . .xxxxxxxxxxx 

lhurA CCCCEEEEEETTTTCHHHHHHHHCCCCEEEE--EEETTEEEEEEEE 



SEQ SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 

SEG 

lhurA TTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHHTTTT-- 

SEQ RRI PILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 

SEG 

lhurA TTTEEEEEEETTTTTTTCCHHHHHHHHCGGGTTTTCEEEEECBTTTTBTHHHHHHHHHHH 

SEQ IQTVKT 

SEG 

lhurA HHHHC. 



Prosite for DKFZphtes3_23111 . 3 
24->32 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_23111 . 3 



ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

* GMgWf s I Fr JcMWGlWNKEMRI LMLGLDNAGKTTI LYMLKlgE . . IVTTI 
MG++ ++ ++GL +KE+++L LGLDN+GKTTI+++LK+ ++ 
1 -MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNIL 48 

PTIGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGIIWVVDSaD 
PTIGF +E+ + ++F+V+D GQ + R +W HYY + ++II+V+DS+D 
4 9 PTI GFS I EKFKSSSLS FTVFDMSGQGRYRNLWEH YYKEGQAI I FVI DSSD 98 

RDRMeEaKqELHaMLNEEEL . . rDAPlLIFANKQDLPgAMSesEIREaLG 
R RM AK+EL+ +LN+ ++ R+ P+L FANK DL++A+++ +++ +L 
99 RLRMVVAKEELDTLLNHPDIKHRRIPILFFANKMDLRDAVTSVKVSQLLC 148 

LHelRCnRPWYIQMCCAVtGEGLYEGMDWLSNYInkRkK* 
L++I+ + PW+I +++A++GEGL+EG DWL ++I+ K 
149 LENI K-DKPWH ICASDAI KGEGLQEGVDWLQDQI QTVKT 186 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 
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DKFZphtes3_23nl9 



group: testes derived 

DKFZphtes3 23nl9 encodes a novel 387 amino acid protein with similarity to rat protein kinase 
C-interacting RBCC protein 1. 

The novel protein contains not the RING-B box-coiled coil (RBCC) motif of RBCC protein 1, and 

thus is not a member of this subgroup of RING finger proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCKl 

Sequenced by LMU 

Locus : unknown 

Insert length: 1579 bp 

Poly A stretch at pos. 1535, polyadenylation signal at pos . 1515 



1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
451 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
1451 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
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Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 

51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 

101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 

151 STLKGPPPEA DLPRSPGNLT ERE EL AGS LA RAIAGGDEKG AAQVAAVLAQ 

201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 

251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 

301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 

351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score - 353, P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2 " ; Rattus norvegicus mRNA for RBCK2, 
complete cds . , N - 1, Score » 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus ubcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, P - 9.3e-34 



>TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length = 498 

HSPs: 



Score = 367 (55.1 bits), Expect - 9.3e-34, P = 9.3e-34 
Identities = 95/212 (44%), Positives - 129/212 (60%) 

Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQH PQK MDGELG--RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 346 SPLQP— SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 

Pedant information for DKFZphtes3_23nl9, frame 2 



Report for DKFZphtes3_23nl9 . 2 

[LENGTH] 387 

[MW] 39949.29 

[pi] 5.53 

[HOMOL] TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 
[BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MA P P AGG AAAAA S D LG S AA V LLAVH AAV R PLG AG P DAEAQL RRLQL S A D PER PGRFRLE L 
SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 



PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 
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SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx. . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPFAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 



(No Prosite data available for DKFZphtes3_23nl9.2) 
(No Pfam data available for DKFZphtes3_23nl9 . 2) 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCKl 



Sequenced by LMU 
Locus: unknown 



Insert length: 1579 bp 

Poly A stretch at pos . 1535, polyadenylation signal at pos. 1515 



1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGGCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
451 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
14 51 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 
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BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 
51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 
101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 
151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 
201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 
251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 
301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 
351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N » 1, 
Score = 353, P = 2.8e-32 

TR£MBL:AB011369_1 product: "RBCK2" ; Rattus norvegicus mRNA for RBCK2, 
complete cds., N 1, Score - 353, P - 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N - 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N - 1, Score 
= 367, P - 9.3e-34 

>TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length = 498 

HSPs: 



Score = 367 (55.1 bits), Expect = 9.3e-34, P - 9.3e-34 
Identities » 95/212 (44%), Positives - 129/212 (60%) 



+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 



+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 



A+LYLLSA T +PQ Q+ M +LG L S G L P P+P 



Query: 


175 


Sbjct: 


1 


Query: 


235 


Sbjct: 


57 


Query: 


295 


Sbjct: 


116 


Query: 


346 


Sbjct: 


172 



-SPLQP--SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 
+ P P W CP CTFIN P RPGCEMC RP T+ 



Pedant information for DKFZphtes3_23nl9, frame 2 
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Report for DKFZphtes3_23nl9 .2 

[LENGTH] 387 

[MW] 39949.29 

[pi] 5.53 

(HOMOL] TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 

[BLOCKS] BL00578B 

[KW] Alpha_Beta 

(KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 

SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtes3_23nl9.2) 
(No Pfam data available for DKFZphtes3_23nl9.2) 
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DKFZphtes3_26g22 



group: intracellular transport/trafficking 

DKFZphtes3_2 6g22 encodes a novel 898 amino acid protein with similarity to Jcinesins . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a kinesin motor domain 
signature. Kinesin is a microtubule-associated force-producing protein that play a role in 
organelle transport. It is an oligomeric complex composed of two heavy chains and two light 
chains. The kinesin motor activity is directed toward the microtubule's plus end. The heavy 
chain contains a large globular N-cerminal domain which is responsible for the motor activity 
of kinesin, which is known to hydrolyze ATP and to bind and move on microtubules. Several 
proteins involved in chromosome segregation and cell divsion contain this motor domain, such 
as drosophila claret segregational protein (ncd) , Drosophila kinesin-like protein (nod), human 
CENP-E and human mitotic kinesin-like protein-1 (MKLP-1) . The novel protein is a new kinesin 
like proptein. 

The new protein can find application in modulating chromosome transport in mitosis and meiosis 
and modulation of cell division. 



strong similarity to kinesins 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3032 bp 

No poly A stretch found, no polyadenylation signal found 



1 CTGAAGCGCT GGGAGGCGGA CATTAAAGTG AAGTGGTTGC GGTAACCTGG 
51 CCTGGGCCTG AAGTGAGTGA GAGGCACATG AAGAGAAGTA TTCAAGTATT 
101 TATACAGATA GGAATCAAGA TAATCAACAA TGTCTGTCAC TGAGGAAGAC 
151 CTGTGCCACC ATATGAAAGT AGTAGTTCGT GTACGTCCGG AAAACACTAA 
201 AGAAAAAGCA GCTGGATTTC ATAAAGTGGT TCATGTTGTG GATAAACATA 
251 TCCTAGTTTT TGATCCCAAA CAAGAAGAAG TCAGTTTTTT CCATGGAAAG 
301 AAAACTACAA ATCAAAATGT TATAAAGAAA CAAAATAAGG ATCTTAAATT 
351 TGTATTTGAT GCTGTTTTTG ATGAAACGTC AACTCAGTCA GAAGTTTTTG 
401 AACACACTAC TAAGCCAATT CTTCGTAGTT TTTTGAATGG ATATAATTGC 
451 ACAGTACTTG CCTATGGTGC CACTGGTGCT GGGAAGACCC ACACTATGCT 
501 AGGATCAGCT GATGAACCTG GAGTGATGTA TCTAACAATG TTACACCTTT 
551 ACAAATGCAT GGATGAGATT AAAGAAGAGA AAATATGTAG TACTGCAGTT 
601 TCATATCTGG AGGTATATAA TGAACAGATT CGTGATCTCT TAGTAAATTC 
651 AGGGCCACTT GCTGTCCGGG AAGATACCCA AAAAGGGGTG GTCGTTCATG 
701 GACTTACTTT ACACCAGCCC AAATCCTCAG AAGAAATTTT ACATTTATTG 
751 GATAATGGAA ACAAAAACAG GACACAACAT CCCACTGATA TGAATGCCAC 
801 ATCTTCTCGT TCTCATGCTG TTTTCCAAAT TTACTTGCGA CAACAAGACA 
851 AAACAGCAAG TATCAATCAA AATGTCCGTA TTGCCAAGAT GTCACTCATT 
901 GACCTGGCAG GATCTGAGCG AGCAAGTACT TCCGGTGCTA AGGGGACCCG 
951 ATTTGTAGAA GGCACAAATA TTAATAGATC ACTTTTAGCT CTTGGGAATG 
1001 TCATCAATGC CTTAGCAGAT TCAAAGAGAA AGAATCAGCA TATCCCTTAC 
1051 AGAAATAGTA AGCTTACTCG CTTGTTAAAG GATTCTCTTG GAGGAAACTG 
1101 TCAAACTATA ATGATAGCTG CTGTTAGTCC TTCCTCTGTA TTCTACGATG 
1151 ACACATATAA CACTCTTAAG TATGCTAACC GGGCAAAGGA CATTAAATCT 
1201 TCTTTGAAGA GCAATGTTCT TAATGTCAAT AATCATATAA CTCAATATGT 
1251 AAAGATCTGT AATGAGCAGA AGGCAGAGAT TTTATTGTTA AAAGAAAAAC 
1301 TAAAAGCCTA TGAAGAACAG AAAGCCTTCA CTAATGAAAA TGACCAAGCA 
1351 AAGTTAATGA TTTCAAACCC TCAGGAAAAA GAAATCGAAA GGTTTCAAGA 
1401 AATCCTGAAC TGCTTGTTCC AGAATCGAGA AGAAATTAGA CAAGAATATC 
1451 TGAAGTTGGA AATGTTACTT AAAGAAAATG AACTTAAATC ATTCTACCAA 
1501 CAACAGTGCC ATAAACAAAT AGAAATGATG TGTTCTGAAG ACAAAGTAGA 
1551 AAAGGCCACT GGAAAACGAG ATCATAGACT TGCAATGTTG AAAACTCGTC 
1601 GCTCCTACCT GGAGAAAAGG AGGGAGGAGG AATTGAAGCA ATTTGATGAG 
1651 AATACTAATT GGCTCCATCG TGTCGAAAAA GAAATGGGAC TCTTAAGTCA 
1701 AAACGGTCAT ATTCCAAAGG AACTCAAGAA AGATCTTCAT TGTCACCATT 
1751 TGCACCTCCA GAACAAAGAT TTGAAAGCAC AAATTAGACA TATGATGGAT 
1801 CTAGCTTGTC TTCAGGAACA GCAACACAGG CAGACTGAAG CAGTATTGAA 
1851 TGCTTTACTT CCAACCCTAA GAAAACAATA TTGCACATTA AAAGAAGCCG 
1901 GCCTGTCAAA TGCTGCTTTT GAATCTGACT TCAAAGAGAT CGAACATTTG 
1951 GTAGAGAGGA AAAAAGTGGT AGTTTGGGCT GACCAAACTG CCGAACAACC 
2001 AAAGCAAAAC GATCTACCAG GGATTTCTGT TCTTATGACC TTTCCACAAC 
2051 TTGGACCAGT TCAGCCTATT CCTTGTTGCT CATCTTCAGG TGGAACTAAT 
2101 CTGGTTAAGA TTCCTACAGA AAAAAGAACT CGGAGAAAAC TAATGCCATC 
2151 TCCCTTGAAA GGACAGCATA CTCTAAAGTC TCCACCATCT CAAAGTGTGC 
2201 AGCTCAATGA TTCTCTTAGC AAAGAACTTC AGCCTATTGT ATATACACCA 
2251 GAAGACTGTA GAAAAGCTTT TCAAAATCCG TCTACAGTAA CCTTAATGAA 
2301 ACCATCATCA TTTACTACAA GTTTTCAGGC TATCAGCTCA AACATAAACA 
2351 GTGATAATTG TCTGAAAATG TTGTGTGAAG TAGCTATCCC TCATAATAGA 
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2401 AGAAAAGAAT GTGGACAGGA 
2451 AGACATCAAG AGCTCGAAGT 
2501 ATGATAACAA AGACATTTTA 
2551 AAGCATTCTA TGCCTGTACC 
2601 TACTGCTGCC AAAAGGAAAC 
2651 CGTTAACTGC AGACGTAAAT 
2701 AATTCAAGTG AGAAGCACTT 
2751 AAGAAACATC TGTAAAATAA 
2801 ATATTTCAAA AGGAAATCTA 
2851 AAGTTGATCA AATCTGCTTT 
2901 TATTTAAAAT CTTTGAAAGA 
2951 TACTTTCAGC AAGCAGAAAA 
3001 CTAAAAAAAT AAAATTTCAA 



GGACTTGGAC TCTACATTTA CTATATGTGA 
GTAAATTACC CGAACAAGAA TCACTACCAA 
CAACGGCTTG ATCCTTCTTC ATTCTCAACT 
AAGCATGGTG CCATCCTACA TGGCAATGAC 
GGAAATTAAC AAGTTCTACA TCAAACAGTT 
TCTGGATTTG CCAAACGTGT TCGACAAGAT 
ACAAGAAAAC AAACCAACAA TGGAACATAA 
ATCCAAGCAT GGTTAGAAAA TTTGGAAGAA 
AGATAAATCA cttcaaaacc AAGCAAAATG 
TCAAAGTTTA TCAATACCCT TTCAAAAATA 
AGACCCATCT TAAAGCTAAG TTTACCCAAG 
ATGAAACTCT TTGTTTTCTT CTTTTGTGTT 
AAGAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 130 bp to 2823 bp; peptide length: 898 
Category: strong similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (113-121) 
KINESIN MOTOR_DOMAINl (252-2 64) 



1 MSVTEEDLCH HMKWVRVRP 
51 VSFFHGKKTT NQNVIKKQNK 
101 FLNGYNCTVL AYGATGAGKT 
151 KICSTAVSYL EVYNEQIRDL 
201 EEILHLLDNG NKNRTQHPTD 
251 IAKMSLIDLA GSERASTSGA 
301 KNQHIPYRNS KLTRLLKDSL 
351 RAKDIKSSLK SNVLNVNNHI 
401 TNENDQAKLM ISNPQEKEIE 
451 ELKSFYQQQC HKQIEMMCSE 
501 ELKQFDENTN WLHRVEKEMG 
551 QIRHMMDLAC LQEQQHRQTE 
601 FKEIEHLVER KKVVVWADQT 
651 SSSGGTNLVK IPTEKRTRRK 
701 QPIVYTPEDC RKAFQNPSTV 
751 VAIPHNRRKE CGQEDLDSTF 
801 DPSSFSTKHS MPVPSMVPSY 
851 AKRVRQDNSS EKHLQENKPT 



ENTKEKAAGF HKVVHVVDKH ILVFDPKQEE 
DLKFVFDAVF DETSTQSEVF EHTTKPILRS 
HTMLGSADEP GVMYLTMLHL YKCMDEIKEE 
LVNSGPLAVR EDTQKGVVVH GLTLHQPKSS 
MNATSSRSHA VFQIYLRQQD KTASINQNVR 
KGTRFVEGTN INRSLLALGN VINALADSKR 
GGNCQTIMIA AVSPSSVFYD DTYNTLKYAN 
TQYVKICNEQ KAEILLLKEK LKAYEEQKAF 
RFQEILNCLF QNREEIRQEY LKLEMLLKEN 
DKVEKATGKR DHRLAMLKTR RSYLEKRREE 
LLSQNGHIPK ELKKDLHCHH LHLQNKDLKA 
AVLNALLPTL RKQYCTLKEA GLSNAAFESD 
AEQPKQNDLP GISVLMTFPQ LGPVQPIPCC 
LMPSPLKGQH TLKSPPSQSV QLNDSLSKEL 
TLMKPSSFTT SFQAISSNIN SDNCLKMLCE 
TICEDIKSSK CKLPEQESLP NDNKDILQRL 
MAMTTAAKRK RKLTSSTSNS SLTADVNSGF 
MEHKRNICKI NPSMVRKFGR NISKGNLR 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2 6g22, frame 1 

SWISSPROT:YB3D_SCHPO PUTATIVE KINES IN-LIKE PROTEIN C2F12.13., N = 3, 
Score - 874, P = 9e-93 

TREMBL:DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
melanogaster kinesin like protein 67a mRNA, complete cds., N - 1, Score 
- 880, P = 4.2e-88 

TREMBL:SPBC649_1 gene: "SPBC649.01c"; product: "putative kinesin-like 
protein"; S.pombe chromosome II cosmid c649., N => 3, Score = 814, P = 
9.8e-86 

PIR:S64238 kinesin-related protein KIP3 - yeast (Saccharomyces 
cerevisiae), N - 2, Score - 802, P - 2.5e-83 



>TREMBL: DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
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melanogaster kinesin like protein 67a mRNA, complete cds. 
Length = 814 

HSPs: 

Score - 880 (132.0 bits), Expect = 4.2e-88, p - 4.2e-88 
Identities = 181/345 (52%), Positives - 238/345 (68%) 

Query: 11 HMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFF-HGKKTTNQNVIKKQN 69 

++KV VRVRP N +E ++ V+D+ L+FDP +E+ FF G K +++ K+ N 

Sbjct: 8 NIKVAVRVRPYNVRELEQKQRSIIKVMDRSALLFDPDEEDDEFFFQGAKQPYRDITKRMN 67 

Query: 70 KDLKFVFDAVFDETSTQSEVFEHTTKPILRS FLNGYNCTVLAYGATGAGKTHTMLGSADE 129 

K L FD VFD ++ ++FE T P++ + LNGYNC+V YGATGAGKT TMLGS 
Sbjct: 68 KKLTMEFDRVFDIDNSNQDLFEECTAPLVDAVLNGYNCSVFVYGATGAGKTFTMLGSEAH 127 

Query: 130 PGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVREDTQKGVVV 189 

PG+ YLTM L+ + + + VSYLEVYNE + +LL SGPL +RED GVVV 

Sbjct: 128 PGLTYLTMQDLFDKIQAQSDVRKFDVGVSYLEVYNEHVMNLLTKSGPLKLREDNN-GVVV 186 

Query: 190 HGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNV 24 9 

GL L S+EE+L +L GN +RTQHPTD NA SSRSHA+FQ+++R ++ + V 

Sbjct: 187 SGLCLTPI YSAEELLRMLMLGNSHRTQHPTDANAESSRSHAIFQVHIRITERKTDTKRTV 246 

Query: 250 RIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKRKNQHIPYRN 309 

K+S+IDLAGSERA+++ G RF EG +IN+SLLALGN IN LAD + HIPYR+ 
Sbjct: 247 KLSMIDLAGSERAASTKGIGVRFKEGASINKSLLALGNCINKLADGLK HIPYRD 300 

Query: 310 SKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDI 355 

S LTR+LKDSLGGNC+T+M+A VS SS+ Y+DTYNTLKYA+RAK I 
Sbjct: 301 SNLTRILKDSLGGNCRTLMVANVSMSSLTYEDTYNTLKYASRAKKI 346 



Pedant information for DKFZphtes3_26g22, frame 1 



Report for DKFZphtes3_26g22 . 1 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ 

[FUNCAT] 

[FUNCAT] 

4e-28 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



[S. 



898 

102281.63 
9.09 

SWISSPROT:YB3D_SCHPO PUTATIVE KINESIN-LIKE PROTEIN C2F12.13. 3e-97 

30.04 organization of cytoskeleton [S. cerevisiae, YGL216w] 2e-88 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL216w] 2e-88 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YGL216w] 2e-88 

30.10 nuclear organization (S. cerevisiae, YGL2l6w] 2e-88 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c] 5e-42 

06.10 assembly of protein complexes [S. cerevisiae, YPRl41c] 5e-42 

03.13 meiosis [S. cerevisiae, YPR141c] 5e-42 

11.01 stress response [S. cerevisiae, YPRl41cl 5e-42 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPR141C] 5e-42 

30.05 organization of centrosome [S. cerevisiae, YPR141c] 5e-42 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKL079w] 

BL00411H 
BL00411G 
BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 



d2kin.l 3.29.1.5.3 Kinesin 

d3kar 3.29.1.5.4 Kinesin 

nucleus 6e-87 
heterodimer 4e-68 
DNA binding 9e-60 
heterotetramer 2e-54 
mitosis 9e-60 
microtubule binding 4e-68 
ATP 6e-87 

phosphoprotein 5e-59 

heterotrimer 4e-68 

purine nucleotide binding le-26 

P-loop 6e-87 

coiled coil 4e-68 

heptad repeat 3e-62 

methylated amino acid 2e-54 

hydrolase 2e-54 

GTP binding le-60 



[Rat (Rattus norvegicus) le-117 
[Baker's yeast (Saccharomyce le-112 
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[PIRKWJ cell division Se-57 

[SUPFAM] kinesin-related protein KIPl 3e-50 

(SUPFAM) kinesin-related protein CIN8 7e-33 

(SUPFAM] kinesin heavy chain 2e-54 

(SUPFAM} suppressor protein SMY1 le-26 

ISUPFAM] kinesin-related protein KIF3 4e-68 

[SUPFAM] kinesin-related protein KIF2 le-46 

[SUPFAM] kinesin-related protein unc-104 7e-60 

[SUPFAM] unassigned kinesin-related proteins 6e-87 

[SUPFAM] centromere protein E 3e-54 

[SUPFAM] kinesin-related protein KLP61F 5e-57 

[SUPFAM] kinesin-related protein MKLP-1 2e-28 

[SUPFAM] pleckstrin repeat homology 7e-60 

[SUPFAM] kinesin-related protein KI FIB 4e-61 

[SUPFAM] kinesin motor domain homology 6e-87 

[SUPFAM] kinesin-related protein KLPA le-43 

[SUPFAM] kinesin-related protein nodA le-30 

[SUPFAM] kinesin-related protein Eg5 5e-59 

[PROSITE] ATP_GTP_A 1 

[PROSITE] KI NES I N_MOTOR_DOMAI Nl 1 

[PFAM] Kinesin motor domain 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 8.57 4 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 



MSVTEEDLCHHMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFFHGKKTT 

........ 1 .......... .V. ... ... * ! V. " TBEEE 

NQNVI KKQNKDLKFVFDAVFDETSTQCEVFEHTTKPI LRS FLNGYNCTVLAYGATGAGKT 

EEEETTTTTTEEEEEETEEETTTTCHHHHHHHHHH-HHHGGGGCCCEEEEEECTTTTCHH 

HTMLGSADEPGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVR 

HHHHTTTT— THHHHHHHHHHHHHHHHGGGCEEEEEEEEEEEETTEEEETT-TCCCCEEE 

EDTQKGVWHGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQD 

EETTTEEEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEEEEE 

KTASINQNVRIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKR 

TTTTCEE EEEEEEEECCCCCCCCCC HHHHHHHHHHHHHHHHHHHHHHHHTTTT 

KNQHIPYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDIKSSLK 

xxxxx 

TTTCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHHH 

SNVLNVNNHITQYVKICNEQKAEILLLKEKLKAYEEQKAFTNENDQAKLMISNPQEKEIE 
xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 



RFQEILNCLFQNREEIRQEYLKLEMLLKENELKSFYQQQCHKQIEMMCSEDKVEKATGKR 
xxxxxxxxxxxxx 



DHRLAMLKTRRSYLEKRREEELKQFOENTNWLHRVEKEMGLLSQNGHI PKELKKDLHCHH 
xxxxxxxxxxx 



LHLQNKDLKAQIRHMMDLACLQEQQHRQTEAVLNALLPTLRKQYCTLKEAGLSNAAFESD 

XXX 



FKEIEHLVERKKVVVWADQTAEQPKQNDLPGISVLMTFPQLGPVQPIPCCSSSGGTNLVK 



IPTEKRTRRKLMPSPLKGQHTLKSPPSQSVQLNDSLSKELQPIVYTPEDCRKAFQNPSTV 



TLMKPSSFTTSFQAISSNINSDNCLKMLCEVAIPHNRRKECGQEDLDSTFTICEDIKSSK 



CKLPEQESLPNDNKDILQRLDPSSFSTKHSMPVPSMVPSYMAMTTAAKRKRKLTSSTSNS 

XXXXXXXXXXXXX 
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SEQ SLTADVNSGFAKRVRQDNSSEKHLQENKPTMEHKRN ICK I N PSMVRKFGRNI SKGNLR 

SEG xxx 

3kar- 



Prosite for DKFZphtes3_26g22 . 1 

PS00017 113->121 ATP_GTP_A PDOC00017 

PS00411 252->264 K I NES I N_MOTOR_DOMA INI PDOC00343 



Pfam for DKFZphtes3_26g22 . 1 
HMM__NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds 

R+RP N +E+++G +VV + + + + +++E S 
Query 17 RVRPENTKEKAAGFHKVVHVVD-KHILVFDPKQEEVSFFHGKKTTNQNV 64 

HMM phksFtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTIFAYGQ 

+ F+FD VF+ ++TQ +V++ + PI+ ++++GYNCT++AYG 
Query 65 IKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRS FLNGYNCTVLAYGA 114 

HMM TGSGKTYTMMGpggehPDHmGIIPRcCHDIFdr Idkf qekDhdFWhVkCS 
TG+GKT+TM G + D+ G+ + +++++ D + + + +S 
Query 115 TGAGKTHTMLG SADEPGVMYLTMLHLYKCMDEIK-EEKIC-STAVS 158 

HMM YMEI YNEelYDLLCPnPqhMkpLnlHEHPNMGpYVqGCTEf HVcSYeDac 
Y+E+YNE+I+DLL+ N ++PL+++E+ G+ V G+T+ +S E+++ 
Query 159 YLEVYNEQIRDLLV-N SGPLAVREDTQKGVVVHGLTLHQPKSSEEIL 204 

HMM hWIWqGnknRHVAaTnMNdhSSRSHtl FTIHVeQrHk . . qcdehvcHSKM 

H+++ GNKNR+ +T MN++SSRSH++F+I ++Q K + V++ KM 
Query 205 H LL DN GN KNRTQH P T DMNAT SSRSHAVFQIYL RQQDKT AS INQNVRIAKM 254 

HMM NLVDLAGSERvnrTGAEGQRlKEGcNINqSLttLGnVInaLaDgqTKYmY 

+L+DLAGSER++ +GA G+R+ EG+NIN+SL++LGNVINALAD + 
Query 255 SLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSK 299 

HMM gghgHIPYRDSKLTWILQDSLGGNcKTcMIACIWPadWNYEETLSTLRYA 
+++HIPYR SKLT+LL+DSLGGNC T MIA+++P+ + Y++T +TL+YA 
Query 300 RKNQHIPYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYA 34 9 

HMM dRAKnlkNkPQINEDPcamalWRrYheQIqdMKhqL* 

+RAK+IK +N++ ++Y+ + K++ 
Query 350 NRAKDIKSSLKSNVLNVN-NHITQYVKICNEQKAEI 384 
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DKFZphtes3_27dl 



group: metabolism 

DKFZphtes3_27dl encodes a novel 712 amino acid protein similar to ubiquitin-specif ic proteases 
{EC 3.1.2.15) . 

The novel protein contains both, a ubiquitin carboxyl -terminal hydrolases family 2 signature I 
and signature 2. Pfam predicts a new member of the ubiquitin carboxyl -terminal hydrolases 
family 2. The ubiquitin system is responsible for the turn over of proteins. Ubiquitin 
carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The novel protein is a new member of the ubiquitin carboxyl-terminal hydrolases family 2, 
represented by proteins such as yeast UBPl-16, human tre-2, human isopeptidase T and others. 

The novel protein can find application in modulation of ubiquitin- and protein metabolism in 
cells. 



similarity to ubiquitin-specif ic proteases 

complete cDNA, complete cds, 4 EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2871 bp 

Poly A stretch at pos. 2836, no polyadenylation signal found 



1 CCAAACCTGA AAGAGGTTGA TTTGTAATGA TTTGCAGGGG GGCACTGGAG 
51 GCAGCGGCCA GGACTTTTCA CTTAGGAGAT CAGCATTTGC CCTGATGGAA 
101 ACTGGGCGAT CCTGCAGGGA CTGACCTCTG AGTTATCCAA AGGCCGACCT 
151 GGGGAAAGAC TGATTTTGAG GTTTTAATAG TTTTCAGATG CTTCAAGTGT 
201 TGTGAACAGA GACTTGTTTG GATTATGCAT TTCTCAGCTA GACTAAATAA 
251 ATGCTAGCAA TGGATACGTG CAAACATGTT GGGCAGCTGC AGCTTGCTCA 
301 AGACCATTCC AGCCTCAACC CTCAGAAATG GCACTGTGTG GACTGCAACA 
351 CGACCGAGTC CATTTGGGCT TGCCTTAGCT GCTCCCATGT TGCCTGTGGA 
401 AGATATATTG AAGAGCATGC ACTCAAGCAC TTTCAAGAAA GCAGTCATCC 
451 TGTTGCATTG GAGGTGAATG AGATGTACGT TTTTTGTTAC CTTTGTGATG 
501 ATTATGTTCT GAATGATAAC GCAACTGGAG ACCTGAAGTT ACTACGACGT 
551 ACATTAAGTG CCATCAAAAG TCAAAATTAT CACTGCACAA CTCGTAGTGG 
601 GAGGTTTTTA CGGTCCATGG GTACAGGTGA TGATTCTTAT TTCTTACATG 
651 ACGGTGCCCA ATCTCTGCTT CAAAGTGAAG ATCAACTGTA TACTGCTCTT 
701 TGGCACAGGA GAAGGATACT AATGGGTAAA ATCTTTCGAA CATGGTTTGA 
751 ACAATCACCC ATTGGAAGAA AAAAGCAAGA AGAACCATTT CAGGAGAAAA 
801 TAGTAGTAAA AAGAGAAGTA AAGAAAAGAC GGCAGGAATT GGAGTATCAA 
851 GTTAAAGCAG AATTGGAAAG TATGCCTCCA AGAAAGAGTT TACGTTTACA 
901 AGGGCTCGCT CAGTCGACCA TAATAGAAAT AGTTTCTGTT CAGGTGCCAG 
951 CACAAACGCC AGCATCACCA GCAAAAGATA AAGTACTCTC TACCTCAGAA 
1001 AATGAAATAT CTCAAAAAGT CAGTGACTCC TCAGTTAAAC GAAGGCCAAT 
1051 AGTAACTCCT GGTGTAACAG GATTGAGAAA TTTGGGAAAT ACTTGCTATA 
1101 TGAATTCTGT TCTTCAGGTG TTGAGTCATT TACTTATTTT TCGACAATGT 
1151 TTTTTAAAGC TTGATCTGAA CCAATGGCTG GCTATGACTG CTAGCGAGAA 
1201 GACAAGATCT TGTAAGCATC CACCAGTCAC AGATACAGTA GTATATCAAA 
1251 TGAATGAATG TCAGGAAAAA GATACAGGTT TTGTTTGCTC CAGACAATCA 
1301 AGTCTGTCAT CAGGACTAAG TGGTGGAGCA TCAAAAGGTA GAAAGATGGA 
1351 ACTTATTCAG CCAAAGGAGC CAACTTCACA GTACATTTCT CTTTGTCATG 
1401 AATTGCATAC TTTGTTCCAA GTCATGTGGT CTGGAAAGTG GGCGTTGGTC 
1451 TCACCATTTG CTATGCTACA CTCAGTGTGG AGACTCATTC CTGCCTTTCG 
1501 TGGTTACGCC CAACAAGACG CTCAGGAATT TCTTTGTGAA CTTTTAGATA 
1551 AAATACAACG TGAATTAGAG ACAACTGGTA CCAGTTTACC AGCTCTTATC 
1601 CCCACTTCTC AAAGGAAACT CATCAAACAA GTTCTGAATG TTGTAAATAA 
1651 CATTTTTCAT GGACAACTTC TTAGTCAGGT TACATGTCTT GCATGTGACA 
1701 ACAAATCAAA TACCATAGAA CCTTTCTGGG ACTTGTCATT GGAGTTTCCA 
1751 GAAAGGTATC AATGCAGTGG AAAAGATATT GCTTCCCAGC CATGTCTGGT 
1801 TACTGAAATG TTGGCCAAAT TTACAGAAAC TGAAGCTTTA GAAGGAAAAA 
1851 TCTACGTATG TGACCAGTGT AACTCAAAGC GTAGAAGGTT TTCCTCCAAA 
1901 CCAGTTGTAC TCACAGAAGC CCAGAAACAA CTTATGATAT GCCACCTACC 
1951 TCAGGTTCTC AGACTGCACC TCAAACGATT CAGGTGGTCA GGACGTAATA 
2001 ACCGAGAGAA GATTGGTGTT CATGTTGGCT TTGAGGAAAT CTTAAACATG 
2051 GAGCCCTATT GCTGCAGGGA GACCCTGAAA TCCCTCAGAC CAGAATGCTT 
2101 TATCTATGAC TTGTCCGCGG TGGTGATGCA CCATGGGAAA GGATTTGGCT 
2151 CAGGGCACTA CACTGCCTAC TGCTATAATT CTGAAGGAGG GTTCTGGGTA 
2201 CACTGCAATG ATTCCAAACT AAGCATGTGC ACTATGGATG AAGTATGCAA 
2251 GGCTCAAGCT TATATCTTGT TTTATACCCA ACGAGTTACT GAGAATGGAC 
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2301 ATTCTAAACT TTTGCCTCCA GAGCTCCTGT TGGGGAGCCA ACATCCCAAT 

2351 GAAGACGCTG ATACCTCGTC TAATGAAATC CTTAGCTGAT CCAAAGACAA 

2401 TGGGGTTTTC TTCCTGTGAT TTATATATAT ACTTTTTAAA AGACTGATGT 

2451 ACCATTTTAA ACTTCATTTT TTCTTGTGAA TCAGTGTATA CTACATTTAT 

2501 ACATTTTATA TCTAACAATT TTTTTTTTTT ACAAAGTATA AATGTATATA 

2551 TCAACTGAAG GTAACTACTT TTTTCATATT TGGAGTTTTA AACTTTTGGT 

2601 GTTTACCTCA GACTGATGTT ACCTCTTTTA TATTTTTATG TCTTAATTGG 

2651 CTCGGATGAT GAACTTGTGC AATCTTCTAC CAACAAAGTT CAAGTGGCAT 

2701 CATTTTATAT ACATGTATCT TTTTCAGGTA TTTTCTATAC AAATTCTTAA 

2751 TAGATGGAAA ATTAGACTCT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2801 AAAAAAAAAA AAAAAAAAAA AAGGGGCGGC CGCTCTAAAA AAAAAAAAAA 

2851 AAAAAAAAAA AAAAAAAAAG G 



BLAST Results 



No BLAST result 



Medline entries 



98072201: 

Regulation of ubiquitin-dependent processes by deubiquitinating 
enzymes . 

98431658: 

The ubiquitin system. 



Peptide information for frame 2 



ORF from 251 bp to 2386 bp; peptide length: 712 
Category: similarity to known protein 
Prosite motifs: UCH_2_1 (274-290) 
UCH_2_2 (619-638) 
UCH_2_2 (619-638) 



1 MLAMDTCKHV GQLQLAQDHS SLNPQKWHCV DCNTTESIWA CLSCSHVACG 
51 RYIEEHALKH FQESSHPVAL EVNEMYVFCY LCDDYVLNDN ATGDLKLLRR 
101 TLSAIKSQNY HCTTRSGRFL RSMGTGDDSY FLHDGAQSLL QSEDQLYTAL 
151 WHRRRILMGK IFRTWFEQSP IGRKKQEEPF QEKIVVKREV KKRRQELEYQ 
201 VKAELESMPP RKSLRLQGLA QSTIIEIVSV QVPAQTPASP AKDKVLSTSE 
251 NEISQKVSDS SVKRRPIVTP GVTGLRNLGN TCYMNSVLQV LSHLLIFRQC 
301 FLKLDLNQWL AMTASEKTRS CKHPPVTDTV VYQMNECQEK DTGFVCSRQS 
351 SLSSGLSGGA SKGRKMELIQ PKEPTSQYIS LCHELHTLFQ VMWSGKWALV 
401 SPFAMLHSVW RLIPAFRGYA QQDAQEFLCE LLDKIQRELE TTGTSLPALI 
451 PTSQRKLIKQ VLNVVNNIFH GQLLSQVTCL ACDNKSNTIE PFWDLSLEFP 
501 ERYQCSGKDI ASQPCLVTEM LAKFTETEAL EGKIYVCDQC NSKRRRFSSK 
551 PVVLTEAQKQ LMICHLPQVL RLHLKRFRWS GRNNREKIGV HVGFEEILNM 
601 EPYCCRETLK SLRPECFIYD LSAVVMHHGK GFGSGHYTAY CYNSEGGFWV 
651 HCNDSKLSMC TMDEVCKAQA YILFYTQRVT ENGHSKLLPP ELLLGSQHPN 
701 EDADTSSNEI LS 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_27dl, frame 2 

PIR:S57591 hypothetical protein YMR223w - yeast (Saccharomyces 
cerevisiae), N = 4, Score - 218, P = 8.4e-38 

SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECI FIC PROCESSING 
PROTEASE 13) (DEUBIQUITINATING ENZYME 11) (KIAA0055)., N = 2, Score = 
300, P = 9.3e-31 

TREMBL: AF079565_1 gene: "Ubp41"; product: "ubiquitin-specif ic protease 
UBP41" ; Mus musculus ubiquitin-specif ic protease UBP41 (Ubp41) mRNA, 
complete cds., N » 3, Score ~ 187, P = 8.7e-30 

PIR: 158376 hypothetical protein unp - mouse, N - 3, Score * 214, P - 
1.2e-28 
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>SWISSPROT:UBPB HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 {EC 3.1.2.15) 
(UBIQUITIN~THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) 
(DEUBIQUITINATING ENZYME 11) (KIAA0O55) . 
Length - 1,118 

HSPs: 

Score = 300 (45.0 bits), Expect - 9.3e-31, Sum P(2) = 9.3e-31 
Identities - 95/301 (31%), Positives » 149/301 (49%) 



381 LCHELHTLFQVMWSGKWALVSPFAMLHSVWRLIPAFRGYAQQDAQEFLCELLDKIQREL- 439 

+ E + + +W+G++ +SP ++ ++ F GY+QQD+QE L L+D + +L 

826 VAEEFGIIMKALWTGQYRYISPKDFKITIGKINDQFAGYSQQDSQELLLFLMDGLHEDLN 885 

440 ETTGTSLPALIPTSQRKLI KQVLN — VVNNI FHGQLLSQVTCLACDNKSNT 488 

E L + LN ++ +F GQ S V CL C KS T 

886 KADNRKRYKEENNDHLDDFKAAEHAWQKHKQLNESIIVALFQGQFKSTVQCLTCHKKSRT 945 

489 IEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKI YVCDQCNSKRRRFS 548 

E F LSL +C+ +D CL + +K E + + + C C ++R 
94 6 FEAFMYLSLPLASTSKCTLQD CL--RLFSK — EEKLTDNNRFYCSHCRARR 992 

54 9 SKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFE-EILNMEPYCC-- 605 

++ K++ I LP VL +HLKRF + GR ++K+ V F E L++ Y 
993 DSLKKIEIWKLPPVLLVHLKRFSYDGRW-KQKLQTSVDFPLENLDLSQYVIGP 104 4 

606 RETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMCTMDEV 665 
+ LK Y+L +V H+G G GHYTAYC N+ W +D ++S ++ V 

1045 KNNLKK YNLFSVSNHYG-GLDGGHYTAYCKNAARQRWFKFDDHEVSDISVSSV 1096 

666 CKAQAYILFYTQ RVTE 681 

+ AYILFYT RVT+ 
1097 KSSAAYILFYTSLGPRVTD 1115 



200 QVKAELESMPPR— KSLRLQGLAQSTIIEIVSVQVPAQTPASPAKDKVLSTSENEISQKV 257 

Q+ AE + P + +S + Q+ 1+ + P TP ++K + EIS ++ 

701 QIPAERDREPSKLKRSYSSPDITQA— IQEEEKRKPTVTPTVNRENKPTCYPKAEIS-RL 757 

258 SDSSVKR-RPIVT- — PGVTGLRNLGNTCYMNSVLQVLS---HLLIF— RQCFLKLDLNQ 308 

S S ++ P+ P +TGLRNLGNTCYMNS+LQ L HL + R C+ D+N+ 

758 SASQIRNLNPVFGGSGPALTGLRNLGNTCYMNSILQCLCNAPHLADYFNRNCYQD-DINR 816 



Query: 
Sbjct: 
Query: 
Sbjct: 

Score - 50 (7.5 bits), Expect - 8.3e-23, Sum P(2) - 8.3e-23 
Identities * 29/106 (27%), Positives - 51/106 (48%) 



173 RKKQEEPFQEKIVVKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQV 232 

+ KQE+ +E+ +++ K R++E E + K + E+ + QA+++SQ 
475 KNKQEKELRERQQEEQKEKLRKEEQEQKAKKKQEA-EENEITEKQQKAKEEMEKKESEQA 533 

233 PAQ TPASPAKD KVLSTSENEIS--QKVSDSSVKRRPIVTPGV 272 

+ T A K+ K S SE+E S +K + KR P TP + 
534 KKEDKETSAKRGKEITGVKRQSKSEHETSDAKKSVEDRGKRCP--TPEI 580 



Query: 
Sbjct: 
Que r y : 
Sbjct: 

Score - 42 (6.3 bits), Expect = 5.7e-22, Sum P(2) 
Identities » 13/58 (22%), Positives -> 27/58 (46%) 

Query: 167 EQSPIGRKKQEEPFQEKIWKREVKKRRQELEY-QVKAELESMPPRKSLRLQGLAQST 223 

EQ +KKQE E +++ K+ ++ E Q K E + ++ + G+ + + 
Sbjct: 498 EQEQKAKKKQEAEENEITEKQQKAKEEMEKKESEQAKKEDKETSAKRGKEITGVKRQS 555 



5.7e-22 



Pedant information for DKFZphtes3_27dl, frame 2 



Report for DKFZphtes3_27dl .2 



(LENGTH) 712 

{MW] 81155.71 

[pi] 8.21 

IHOMOL] SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 

(UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) (DEUBIQUITINATING 
ENZYME 11) (KIAA0055) . 4e-32 

[FUNCAT] 06.13.01 cytoplasmic degradation IS. cerevisiae, YMR223w] 5e-33 

( FUNCAT ] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, farnesylation and processing) [S. cerevisiae, YMR223wJ 5e-33 
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[ FUNCAT ] 

[ FUNCAT } 

[ FUNCAT ] 

[ FUNCAT ] 

t FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS J 

[BLOCKS] 

[BLOCKS] 

[EC] 

[PIRKWJ 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[PFAM] 

[KW] 

[KW] 



06.13 proteolysis [S. cerevisiae, YBL067c] 3e-19 

10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 2e-17 

03.10 sporulation and germination [S. cerevisiae, YDR069c] 2e-17 

30.10 nuclear organization [S. cerevisiae, YDR069c] 2e-17 

30.03 organization of cytoplasm [S. cerevisiae, YDR069c] 2e-17 

09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 2e-17 

04.05.01.04 transcriptional control [S. cerevisiae, YNL186w] 4e-17 

99 unclassified proteins [S. cerevisiae, YHLOlOc] 3e-12 

BL00970A Nuclear transition protein 2 proteins 

BL00972D 

BL00972C 

BL00972B 

BL00972A 

3.1.2.15 Ubiquitin thiolesterase 5e-06 
alternative splicing 2e-ll 
thiolester hydrolase 5e-06 
hydrolase le-14 
RING finger homology 7e-ll 
deubiquinating enzyme SSV7 5e-16 
MYRISTYL 5 
AMIDATION 2 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 10 
TYR_PHOSPHO_SITE 2 
UCH_2_2 1 

PKC_PHOSPHO_SITE 17 
ASN_GLYCOSYLATION 4 
UCH_2_1 1 

Ubiquitin carboxyl-terminal hydrolases family 2 
Ubiquitin carboxyl-terminal hydrolases family 2 
Alpha_Beta 

LOW COMPLEXITY 4.92 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MLAMDTCKHVGQLQLAQDHSSLNPQKWHCVDCNTTESIWACLSCSHVACGRYIEEHALKH 

ccccccccchhhhhhhhcccccccccceeecccceeeeeeeccccccccchhhhhhhhhh 

FQESSHPVALEVNEMYVFCYLCDDYVLNDNATGDLKLLRRTLSAIKSQNYHCTTRSGRFL 

hhhhccceeecccceeeeeeccccccccccccchhhhhhhhhhhhhcccceeeccccccc 

RSMGTGDDSYFLHDGAQSLLQSEDQLYTALWHRRRILMGKIFRTWFEQSPIGRKKQEEPF 

cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhh 

QEKIWKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQVPAQTPASP 

xxxxxxxxxxxxxxxx 

hheeehhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccccccc 

AKDKVLSTSENEISQKVSDSSVKRRPIVTPGVTGLRNLGNTCYMNSVLQVLSHLLIFRQC 

ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

FLKLDLNQWLAMTASEKTRSCKHPPVTDTWYQMNECQEKDTGFVCSRQSSLSSGLSGGA 

xxxxxxxxxxxxxx 

hhhhhhchhhhhhhhhhhhhhccccccceeehhhhhcccccccccccccccccccccccc 

S KG RKMEL IQPKEPTSQYISLCHELHTL FQVMW SGKWALVSP FAM LH S VWRL I PA FRG YA 

xxxxx 

ccccceeecccccccchhhhhhhhhhhhhhhhhccceeeeccchhhhhhhhhhhccccch 

QQDAQEFLCELLDKIQRELETTGTSLPALIPTSQRKLIKQVLNVVNNIFHGQLLSQVTCL 

hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhccccchhhhhhhhc 

ACDNKSNTIEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQC 

cccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhccceeecccc 

NSKRRRFSSKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFEEILNM 

ccccccccccchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccccceeeeccccccc 

EPYCCRETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMC 

ccccccccccccccceeeeeeeeeeeecccccccccceeeeccccccceeeecccccccc 

TMDEVCKAQAYILFYTQRVTENGHSKLLPPELLLGSQHPNEDADTSSNEILS 

cchhhhhhhhhhhhhheeeecccccccccccccccccccccccccccccccc 
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Prosite for DKF2phtes3_27dl . 2 



PS00001 


33->37 


ASN GL YCOS YLAT I ON 


PDOC00001 


PS00001 


90->94 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


484 


->488 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


653 


->657 


ASN GLYCOSYLATION 


PDOC00001 


PSOO004 


545 


->549 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




6->9 


PKC PHOSPHO SITE 


PDOCO0005 


PS00005 


113 


->116 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116 


->119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


213 


->216 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


254 


->257 


PKC PHOSPHO SITE 


PDOC00005 


PSOO005 


261 


->264 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


315 


->318 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


320 


->323 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


394 


->397 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


453 


->456 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


506 


->509 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


542 


->545 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


548 


->551 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


580 


->583 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


608 


->61 1 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


611 


->614 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


676 


->679 


PKC PHOSPHO"SITE 


PDOC00005 


PS00006 


125 


->129 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


164 


->168 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


223 


->227 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


247 


->251 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


249 


->253 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


313 


->317 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


506 


->510 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


525 


->529 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


661 


->665 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


706 


->710 


CK2 PHOSPHO_SITE 


PDOC00006 


PSO0007 


193 


->20O 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


192 


->200 


TYR PHOSPHO_SITE 


PDOC00007 


PS00008 


218 


->224 


MYRISTYL 


PDOC00008 


PS00008 


355 


->361 


MYRISTYL 


PDOC00008 


PS00008 


359 


->365 


MYRISTYL 


PDOC00008 


PS00008 


471 


->477 


MYRISTYL 


PDOC00008 


PS00008 


589 


->595 


MYRISTYL 


PDOC00008 


PS00009 


171 


->175 


AMI DAT ION 


PDOC00009 


PS00009 


362 


->366 


AMI DAT I ON 


PDOC00009 


PS00972 


274 


->290 


UCH 2 1 


PDOC00750 


PS00973 


619 


->638 


UCH 2 2 


PDOC0075O 



Pfam for DKFZphtes3_27dl . 2 



HMM_NAME 
HMM 

Query 



Ubiquitin carboxyl- terminal hydrolases family 2 



*GIqNlGNTCYMNSIIQCL* 
G++NLGNTCYMNS++Q+L 
274 GLRNLGNTCYMNSVLQVL 



291 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM *YdLYgVICHYGntldyGHYWaYVKNenhHRWkWYYFDDEtV* 

YDL +V+ H+G + ++GHY+AY++N + ++W+ +D++ 
Query 619 YDLSAVVMHHGKGFGSGHYTAYCYNSE—GGFWVHCNDSKL 657 
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DKFZphtes3_27k4 



PCT/IBOO/01496 



group: transmembrane protein 

Summary DKFZphtes3_27k4 encodes a novel 490 amino acid protein with similarity to two 
hypothetical C.elegans proteins. 

The novel protein contains 10 transmembrane regions and a leucine zipper. It is a member of 
the new 10 trans-membrane domain containing protein family which is specific for multicellular 
eukariotes . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



strong similarity to C.elegans K07H8 .2/ZK185. 2 
membrane regions: 10 

complete cDNA, complete cds potential start at Bp 109, few EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1901 bp 

Poly A stretch at pos . 1866, no polyadenylation signal found 



1 GTGATTTACC AGAAAAACCA AGAAGACAGG CACAAAAAAG CAAACGGCAT 
51 TTGGCAAGAT GGATTATCAA CTGCAGTACA GACTTTTAGT AATAGATCTG 
101 AGCAACACAT GGAGTATCAC AGTTTCTCAG AGCAGTCTTT TCATGCCAAT 
151 AATGGGCACG CATCATCAAG CTGCAGCCAA AAGTATGATG ACTATGCCAA 
201 TTATAATTAC TGTGATGGAA GGGAGACTTC AGAAACCACT GCCATGTTAC 
251 AAGATGAAGA TATATCTAGT GATGGTGATG AAGATGCTAT TGTAGAAGTG 
301 ACCCCAAAAT TACCAAAGGA ATCCAGTGGC ATCATGGCAT TGCAAATACT 
351 TGTGCCCTTT TTGCTAGCTG GTTTTGGAAC AGTTTCAGCT GGCATGGTAC 
401 TGGATATAGT ACAGCACTGG GAGGTGTTCA GAAAAGTTAC AGAAGTTTTC 
451 ATTTTAGTCC CTGCACTTCT TGGTCTCAAA GGGAACTTGG AAATGACATT 
501 GGCATCCAGA TTATCCACTG CAGTAAATAT TGGGAAGATG GATTCACCCA 
551 TTGAAAAGTG GAACCTAATA ATTGGCAACT TGGCTTTAAA GCAGGTTCAG 
601 GCAACAGTAG TGGGTTTTCT AGCAGCTGTG GCAGCAATTA TATTGGGCTG 
651 GATTCCAGAA GGAAAATATT ACCTTGATCA TTCCATACTT CTGTGCTCTA 
701 GCAGTGTGGC AACTGCCTTC ATTGCATCTC TTCTGCAGGG AATAATAATG 
751 GTTGGGGTTA TCGTTGGTTC AAAGAAGACT GGTATAAATC CTGATAATGT 
801 TGCTACACCC ATTGCTGCTA GTTTTGGCGA CCTTATAACT CTTGCCATAT 
851 TGGCTTGGAT AAGTCAGGGC TTATACTCCT GTCTTGAGAC CTATTACTAC 
901 ATTTCTCCAT TAGTTGGTGT ATTTTTCTTG GCTCTAACCC CTATTTGGAT 
951 TATAATAGCT GCCAAACATC CAGCCACAAG AACAGTTCTC CACTCAGGCT 
1001 GGGAGCCTGT CATAACAGCT ATGGTTATAA GTAGCATTGG GGGCCTTATT 
1051 CTGGACACAA CTGTATCAGA CCCAAACTTG GTTGGGATTG TTGTTTACAC 
1101 GCCAGTTATT AATGGTATTG GTGGTAATTT GGTGGCCATT CAGGCTAGCA 
1151 GGATTTCTAC CTACCTCCAT TTACATAGCA TTCCAGGAGA ATTGCCTGAT 
1201 GAACCCAAAG GTTGTTACTA CCCATTTAGA ACTTTCTTTG GTCCAGGAGT 
1251 AAATAATAAG TCTGCTCAAG TTCTACTGCT TTTAGTGATT CCTGGACATT 
1301 TAATTTTCCT CTACACTATT CATTTGATGA AAAGTGGTCA TACTTCTTTA 
1351 ACTATAATCT TCATAGTAGT GTATTTATTT GGCGCTGTGT TACAGGTATT 
1401 TACCTTGCTG TGGATTGCTG ACTGGATGGT CCATCACTTC TGGAGGAAAG 
14 51 GAAAGGACCC GGATAGTTTC TCCATCCCCT ACCTAACAGC ATTGGGTGAT 
1501 CTGCTCGGGA CAGCTCTGTT AGCCTTAAGT TTTCATTTTC TTTGGCTTAT 
1551 TGGAGATCGA GATGGAGATG TTGGAGACTA ATAAATTCTA CAAACTGCTC 
1601 TCAAGTTACC AAGGAAGAAA ATACACGACA ACCACTTATG GCTCTTTTTC 
1651 AAAACTCTTA AATCAGTAGT TTGACTTTTG CCAGGGTAAT CTTCAGTTGG 
1701 CCCTGATTCA ATTAAATGGC CTTAATTTTT TTTTAAGGAA TTTGTGTCAA 
1751 AACCAGAATG AAGAGTATTC GTGCTGCTTT TCATAGAATA AATGATAATT 
1801 TGACATAGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1851 AAAAAAAAAA AAGGGGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
1901 G 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 109 bp to 1578 bp; peptide length: 490 
Category: similarity to unknown protein 



1 MEYHSFSEQS FHANNGHASS SCSQKYDDYA NYNYCDGRET SETTAMLQDE 

51 DISSDGDEDA IVEVTPKLPK ESSGIMALQI LVPFLLAGFG TVSAGMVLDI 

101 VQHWEVFRKV TEVFILVPAL LGLKGNLEMT LASRLSTAVN IGKMDSPIEK 

151 WNLIIGNLAL KQVQATWGF LAAVAAIILG WIPEGKYYLD HSILLCSSSV 

201 ATAFIASLLQ GIIMVGVIVG SKKTGINPDN VATPIAASFG DL I TLA I LAW 

251 ISQGLYSCLE TYYYISPLVG VFFLALTPIW IIIAAKHPAT RTVLHSGWEP 

301 VITAMVISSI GGLILDTTVS DPNLVGIVVY TPVINGIGGN LVAIQASRIS 

351 TYLHLHSIPG ELPDEPKGCY YPFRTFFGPG VNNKSAQVLL LLVIPGHLIF 

401 LYTIHLMKSG HTSLTIIFIV VYLFGAVLQV FTLLWIADWM VHHFWRKGKD 

451 PDSFSIPYLT ALGDLLGTAL LALSFHFLWL IGDRDGDVGD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27k4, frame 1 

TREMBL:AF036704_2 gene: "ZK185.2 M ; Caenorhabditis elegans cosmid 
ZK185., N = 1 , Score =• 730, P = 3.1e-72 

TREMBL:AF047659_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid 
K07H8., N = 1, Score - 940, P * 1.7e-94 

>TREMBL:AF047659_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid K07H8. 
Length - 507 

HSPs: 

Score - 940 (141.0 bits), Expect - 1.7e-94, P = 1.7e-94 
Identities - 204/412 (49%), Positives =» 271/412 (65%) 

LPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPALLGLKGNL 
+P ESS ++ Q+L PF +AG G V AG+VL IV W +F ++ E+ ILVPALLGLKGNL 



EMTLASRLST N+G MDS ++ +++I NLAL QVQATVV FLA+ A L +IP G + 



H -L+C+SS+ATA ASL+ ++MV VIV S+K INPDNVATPIAAS GDL TL + 



+++ +V V FL L P WI IA ++ T+ L++GW PVI +M+I 



SS GG IL+T V + + Y PV+NG+GGNL A+QASR+STY H G LP+E 



VNNKSAQVLLLLVI PGHLI FLYTI HLM KSGHTSLTIIFI VV 421 

+++SA+VLLLLV+PGH+ F + I L K+ T +F + 



Y+ A++QV LL++ +V W+ DPD+ I PYLTALGDLLGT LL + F 
YMIAAIIQVVILLFVCQLLVALLWKWKIDPDNSVIPYLTALGDLLGTGLLFIVF 

Pedant information for DKFZphtes3_27Jc4, frame 1 

Report for DK"Zphtes3_27k4 . 1 



[ LENGTH) 490 

[MW] 53266.39 



Query: 


68 


Sbjct: 


82 


Query: 


128 


Sbjct: 


142 


Query: 


188 


Sbjct: 


202 


Query: 


248 


Sbjct: 


262 


Query: 


308 


Sbjct: 


322 


Query: 


368 


Sbjct: 


380 


Query: 


422 


Sbjct: 


440 
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[plj 


5.29 






t HOMOL ] 


TREMBL:AF047659_9 gen 


e: "K07H8.2**; Caenorhabditis 


elegans cosmid K07H8. 4e-94 


[PROSITE] 


LEUCINE ZIPPER 1 






[PROSITEJ 


MYRISTYL 7 






[PROSITE] 


CAMP PHOSPHO SITE 


1 




[PROSITE] 


CK2 PHOSPHO_SITE 


7 




I PROSITE] 


P ROKAR_L I POP ROT E I N 


2 




t PROSITE] 


TYR PHOSPHO_SITE 


1 




I PROSITE] 


PKC~PHOSPHO SITE 


3 




[PROSITE] 


ASN~GLYCOSYLATION 


1 




[KW] 


TRANSMEMBRANE 10 






tKW] 


LOW COMPLEXITY 3. 


06 % 





SEQ MEYHSFSEQSFHANNGHASSSCSQKYDDYANYNYCDGRETSETTAMLQDEDISSDGDEDA 

SEG 

PRD cccccccceeeccccccccccccccccccceeecccccccchhhhhhhhcccccccccee 

MEM 

SEQ IVEVTPKLPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPAL 

SEG 

PRD eeeeeccccccchhhhhhhhhhhhhhhcccchhhhhhhhhcchhhhhcccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . MMMMMMMMMMMMMMM 

SEQ LGLKGNLEMTLASRLSTAVNIGKMDSPIEKWNLI IGNLALKQVQATVVGFLAAVAAIILG 

SEG 

PRD ccccchhhhhhhhhhhhhhccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMM MMMMMMMMMMMMMMMMM 

SEQ WIPEGKYYLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFG 

SEG 

PRD hcccceeecccceeehhhhhhhhhhhhhhhhhhhhheeeecccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ DLITLAILAWISQGLYSCLETYYYISPLVGVFFLALTPIWIIIAAKHPATRTVLHSGWEP 

SEG 

PRD cchhhhhhhhhhhhhhhhcceeeeehhhhhhhhhhchhhhhhhhccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMM .... MMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ VITAMVISSIGGLILDTTVSDPNLVGIVVYTPVINGIGGNLVAIQASRISTYLHLHSIPG 

SEG 

PRD hcchhhhhhcceeeeccccccccceeeeeeceeeecccccceeeeehhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ ELPDEPKGCYYPFRTFFGPGVNNKSAQVLLLLVI PGHLI FLYTIHLMKSGHTSLTI IFI V 

SEG 

PRD cccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhcccccccceeeehhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM. . .MMMMMMM 

SEQ VYLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSFHFLWL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccceeeeeecchhhhhhhhhhhhheeee 

MEM MMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IGDRDGDVGD 

SEG 

PRD eecccccccc 

MEM MM 



Prosite for DKFZphtes3_27k4 . 1 



PS00001 


383->387 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


108->112 


CAMP PHOSPHORS IT E 


PDOC00004 


PS00005 


23->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


65->68 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


221->224 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


5->9 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


54->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


146->150 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


238->242 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


257->261 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


296->300 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


318->322 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


25->33 


TYR~PHOSPHO~SITE 


PDOC00007 


PS00008 


90->96 


MYRISTYL 


PDOC00008 


PS00008 


122->128 


MYRISTYL 


PDOC00008 


PS00008 


216->222 


MYRISTYL 


PDOC00008 


PS00008 


220->226 


MYRISTYL 


PDOC00008 
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PS00008 254->260 MYRISTYL 

PS00008 336->342 MYRISTYL 

PS00008 339->345 MYRISTYL 

PS00013 12->23 PROKAR LIPOPROTEIN 

PS00013 248->259 PROKAR~LIPOPROTEIN 

PS00029 459->481 LEUCINE_ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00013 
PDOC00013 
PDOC00029 



(No Pfam data available for DKFZphtes3_27k4 . 1) 
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DKFZphtes3_27ol4 



group: testes derived 

DKFZphtes3_27ol4 encodes a novel 358 amino acid protein with similarity to C. elegans cosmid 
C55A6. 

The new protein contains a C3HC4 zinc finger (RING finger) signature. The ring finger 
structure binds two atoms of zinc, and is involved in mediating protein-protein interactions. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



similarity to C. elegans C55A6.1 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: /map« M 6" 

Insert length: 2158 bp 

Poly A stretch at pos. 2137, polyadenylation signal at pos. 2120 



1 CCGAGGCCAG AGAGAAAAGA CTGCGAGGTG GCCGCAGCTG TGGCCGGAGA 
51 GCACAAAGAA TGAACCAGCA GTGGAAGAGA AAATACTGTA AGCTGGCTGA 
101 CTGCTGGTGA AGAAAATGCT TTATTTTTGT GGCAGGCATC TGTGGGATCT 
151 GTAATAGAAA TATATTGGAG TAATTCAAGA TTCTGTGGTT GGCCCTTTTG 
201 ACTGCTCTCT CTACAGGTTT AATTTGGGCA TTTACTCATT TTCATGGCTC 
251 CAAGGACCAT GTATGTGTTG GGGATCTTCA ATATTCATGT TATTTTCTCC 
301 TTTGGTCTTA TATGATTGTT ACCTTTATGA AGCTTTAGTG ATTACAAAGC 
351 ACTTTTTTTG TCCATTTTTA CCTGAGCTTT GTAAACTCTG ATTTGCAGGA 
401 TGGCTGGCTG TGGTGAAATT GATCATTCAA TAAACATGCT TCCTACAAAC 
451 AGGAAAGCGA ACGAGTCCTG TTCTAATACT GCACCTTCTT TAACCGTCCC 
501 TGAATGTGCC ATTTGTCTGC AAACATGTGT TCATCCAGTC AGTCTGCCCT 
551 GTAAGCACGT TTTCTGCTAT CTATGTGTAA AAGGAGCTTC ATGGCTTGGA 
601 AAGCGGTGTG CTCTTTGTCG ACAAGAAATT CCCGAGGATT TCCTTGACAA 
651 GCCAACCTTG TTGTCACCAG AAGAACTCAA GGCAGCAAGT AGAGGAAATG 
701 GTGAATATGC ATGGTATTAT GAAGGAAGAA ATGGGTGGTG GCAGTACGAT 
751 GAGCGCACTA GTAGAGAGCT GGAAGATGCT TTTTCCAAAG GTAAAAAGAA 
801 CACTGAAATG TTAATTGCTG GCTTTCTGTA TGTCGCTGAT CTTGAAAACA 
851 TGGTTCAATA TAGGAGAAAT GAACATGGAC GTCGCAGGAA GATTAAGCGA 
901 GATATAATAG ATATACCAAA GAAGGGAGTA GCTGGACTTA GGCTAGACTG 
951 TGATGCTAAT ACCGTAAACC TAGCAAGAGA GAGCTCTGCT GACGGAGCGG 
1001 ACAGTGTATC AGCACAGAGT GGAGCTTCTG TTCAGCCCCT AGTGTCTTCT 
1051 GTAAGGCCCC TAACATCAGT AGATGGTCAG TTAACAAGCC CTGCAACACC 
1101 ATCCCCTGAT GCAAGCACTT CTCTGGAAGA CTCTTTTGCT CATTTACAAC 
1151 TCAGTGGAGA CAACACAGCT GAAAGGAGTC ATAGGGGAGA AGGAGAAGAA 
1201 GATCATGAAT CACCATCTTC AGGCAGGGTA CCAGCACCAG ACACCTCCAT 
1251 TGAAGAAACT GAATCAGATG CCAGTAGTGA TAGTGAGGAT GTATCTGCAG 
1301 TTGTTGCACA GCACTCCTTG ACCCAACAGA GACTTTTGGT TTCTAATGCA 
1351 AACCAGACAG TACCCGATCG ATCAGATCGA TCGGGAACTG ATCGATCAGT 
1401 AGCAGGGGGT GGAACAGTGA GTGTCAGTGT CAGATCTAGA AGGCCTGATG 
1451 GACAGTGCAC AGTAACTGAA GTTTAAATAA AAATGTCTTC AGCTCCATGC 
1501 TCAAGGTTGA AAGGGTTACC TGTAAATTTC TGCCCACATA AC AT TAT ACT 
1551 CATCCCTAGT AGTGCATTTT GGGAGTTGGG GTGGGAAGGG GTATGGGAAG 
1601 GATAGACTCA TAATTAAAAT GTCTAACATG TCTCTGTTGA GAAATTTATT 
1651 TAATGTAAGG AACTTGGGTG TTAATAGTTG AGAGCTGTTT AGTAATAACC 
1701 CAGTTTTCTT GAGGTCTGTT TACTTTATAC TTTTTAAAAA CTTCTGTAGT 
1751 TCTTTTGGCC AGTGTGTTTG TATTATCTGT GCATTAATGG TCCTCATCTG 
1801 ACTCCTGCAT TGTGTCTTAT TTTTCTGCAT GGATTGGCAT AAGACCATTA 
1851 CTAAAATTTG GCACCTGTGA GATGTTTGAT ATTATGAACA GGAAACATAA 
1901 TTTAATGTAT GAATAGATGT GAATTTGGGA TTTCAAAATA GATGAATAAC 
1951 AACTATTTTA TAGTAAAGTT ATTGAAATGG AAATGAAAAC AGCCAGTAAC 
2001 TTATGTTTCA GAATGTTTGT AACACACTTC ATGGTGTTCC CATAGGCTTT 
2051 GCTGTCTAGT CTTATAGTTT GAGGTTTTTT TGGTCTGCAT TTTTCTTTTT 
2101 GATTACAAAA TTTATAATTT AATAAATACT AGAGTTTATC AAAAAAAAAA 
2151 AAAAAAAG 



BLAST Results 



Entry HSG117 from database EMBL: 
human STS SHGC-36270. 
Score = 1148, P - 8.9e-45, identities = 240/250 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 400 bp to 1473 bp; peptide length: 358 
Category: similarity to unknown protein 
Prosite motifs: ZINC_FINGER_C3HC4 (51-61) 



1 MAGCGEIDHS INMLPTNRKA NESCSNTAPS LTVPECAICL QTCVHPVSLP 
51 CKHVFCYLCV KGASWLGKRC ALCRQEIPED FLDKPTLLSP EELKAASRGN 
101 GEYAWYYEGR NGWWQYDERT SRELEDAFSK GKKNTEMLIA GFLYVADLEN 
151 MVQYRRNEHG RRRKIKRDII DIPKKGVAGL RLDCDANTVN LARESSADGA 
201 DSVSAQSGAS VQPLVSSVRP LTSVDGQLTS PATPSPDAST SLEDSFAHLQ 
251 LSGDNTAERS HRGEGEEDHE SPSSGRVPAP DTSIEETESD ASSDSEDVSA 
301 VVAQHSLTQQ RLLVSNANQT VPDRSDRSGT DRSVAGGGTV SVSVRSRRPD 
351 GQCTVTEV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27ol4, frame 1 

TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6, 
N = 2, Score - 165, P = 4.2e-15 

SWISSPROT:YWZ6_CAEEL HYPOTHETICAL 39.3 KD PROTEIN C02B8 . 6 IN CHROMOSOME 
X., N = 2, Score = 136, P = 3.1e-ll 



>TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 
Length = 484 

HSPs: 

Score = 165 (24.8 bits), Expect = 4.2e-15, Sura P(2) « 4.2e-15 
Identities = 42/106 (391), Positives = 61/106 (57%) 



Query: 75 QEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRN-GWWQYDERTSRELEDAFSKGKK 133 

Q +P LD ++ PEE K Y W Y G+N GWW+++ R RE+E+A++ GK 

Sbjct: 93 QNVPALDLDA-SICDPEERK Y-WI YSGKNQGWWRFEPRNEREIEEAYNAGKC 142 

Query: 134 NTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKR DI I D- I PKKGVAGL 180 

+ E++I G YV D +QY R + R +KR DDI KG+AG+ 
Sbjct: 143 HCEVVICGRPYVIDFHQFLQYPRGVPNQARHVKRVSADDFDGIGVKGLAGI 193 

Score - 96 (14.4 bits), Expect » 4.2e-15, Sum P(2) - 4.2e-15 
Identities = 19/54 (35%), Positives - 30/54 (55%) 

Query: 35 ECAICLQTCVHPVSLP-CKHVFCYLCVKGASW — LGKRCALCRQEIPEDFLDKPT 86 

EC IC + P ++P C H FC++C+KG +G C +CR I + +P+ 

Sbjct: 11 ECPICQCKMIVPTTIPACGHKFCFICLKGVYMNDMGG-CPMCRGPIDSNIFAQPS 64 

Pedant information for DKFZphtes3_27ol4, frame 1 



Report for DKFZphtes3_27ol4 . 1 

(LENGTH] 358 

[MW] 38818.90 

[pi] 5.17 

IHOMOL] TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 2e-12 

[ FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) IS. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 03.19 recombination and dna repair [S, cerevisiae, YCR066w] 3e-04 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YCR066w] 3e-04 
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( FUNCAT ] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YCR066w) 3e-04 

( FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YDR265w) 4e-04 

{ FUNCAT} 30.19 peroxisomal organization [S. cerevisiae, YDR265w] 4e-04 

[BLOCKS] BL00518 Zinc finger, C3HC4 type, proteins 

[PROSITE] MYRISTYL 2 

[PROSITE] AMIDATION 3 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO SITE 12 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] ZINC_FINGER_C3HC4 1 

[PROSITE] PKC PHOSPHO SITE 9 

[PROSITE] ASN~ GLYCOSYLATION 2 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[KWJ Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 19.83 % 

SEQ MAGCGEIDHSINMLPTNRKANESCSNTAPSLTVPECAICLQTCVHPVSLPCKHVFCYLCV 

SEG 

lrmd- TTTTTEETTTEEEETTTEEEEHHHH 

SEQ KGASWLGKRCALCRQEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRNGWWQYDERT 

SEG 

1 rmd- HHHHHHCCBTTTTTCBCGGG-CBCC 

SEQ SRELEDAFSKGKKNTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKRDIIDIPKKGVAGL 

SEG xxxxxxxxxxxxxxx 

lrmd- 

SEQ RLDCDANTVNLARESSADGADSVSAQSGASVQPLVSSVRPLTSVDGQLTSPATPSPDAST 

SEG xxxxxxxxxxxx 

lrmd- 

SEQ SLEDSFAHLQLSGDNTAERSHRGEGEEDHESPSSGRVPAPDTSIEETESDASSDSEDVSA 

SEG x xxxxxxxxxxxxxxxxxxxx 

lrmd- 

SEQ VVAQHSLTQORLLVSNANQTVPDRSDRSGTDRSVAGGGTVSVSVRSRRPDGQCTVTEV 

SEG xxx xxxxxxxxxxxxxxxxxxxx 

1 rmd- 



Prosite for DKFZphtes3_27ol4 . 1 
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ZINC FINGER C3HC4 


PDOC00449 
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Pfara for DKFZphtes3_27ol4 . 1 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM * C P I C FcT FQ1 Dy PW P Fde PmMl PCgH s FC y pC I r r W C PmC * 

C+IC L + P++LPC+H+FCY C++ C +C 

Query 36 CAIC LQT CVHPVSLPCKHVFCYLCVKGASWLGKRCALC 73 
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DKF2phtes3_28dl4 



group: testes derived 

DKFZphtes3_28dl4 encodes a novel 97 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1279 bp 

Poly A stretch at pos. 1232, no polyadenylation signal found 



1 GGAGCTCAGA AGTTGGGCAA AGGTCACAGC AGACTTCCTG AAAAGCAGAC 

51 ACTGAGGAAC ACAGTGGAGA GCGGGAGTTC ACAGCGACGC AGCTGAGGAC 

101 GACGCAGGAC CTCTCCCAAA GGTGCTGCAG CTCCAGCACC AGGGGCCAGG 

151 GCTGCGGCGA CAGCAGCTCA GCAACCCTTG CTGTGCTCAA GTTCTTGGGG 

201 ATTCAGAGCT AAGTTCAAAA TTTAGAAACA GTGCCTTAAA GACGGGCAAG 

251 AAAACCCGGT GTGGGAGTCT GCTCATCTAT GGTTTGTTAC TGCTCTCGCT 

301 TTGATATTCT TAAATTCCTA GGTACCAATG AAAAAGCCAA GTGAACGTGG 

351 CAGAGTGAGG AGGAGACAGG AGCGTGTGCA CCTTCCATCT GTGAGAGGCA 

401 CACTTCAGTC TGGGTTCAAG ATGCAGAATG GTGCCTACAG CAAAAAAAAA 

451 AAAAACACCC TCCTCCCTTC TTTACCATTT GAATGGACAT TTTCCTTACC 

501 TGTGATCCCA ACAGAAACAG ATCCAGACCT ATCATGTGAA GTCCACGTTC 

551 CAGGATCAGA AGTAACCAGT TTATGGACTG AGCTTACACG GGAAAGTCTA 

601 CCCCCGACTC CTTCTGGATA GTAACATACA CAGCTGCATA AAAACGTCTC 

651 CAAGGGGACA TACGATGCAT TTGCTTGGTG TCCCAGCCAA GCTCCCCACC 

701 GGCGACCTCA CTGTTCCTTA GAGCTCGAGA GCTCGTCTCC TATCAATCAG 

751 AGAACCCCAT CAGCTGTGAC CAACAGAGCT GGAGCCCTCT GTGGAGGGAG 

801 CTGACCCCAC ACACAGGACA GAGCAGAATC CTGATTATTT TACAAACTGC 

851 AAACCTTCTG AGTAAGAAGA CAAAAATATA CATTCCAAGG TATCTGTAAA 

901 GTGCTTGGAA GATGCAGACA GCTGCACCGA GGGGCTCTGA TCCATCCACA 

951 CGCTGCGCTT TGCTGCGGTC ACACACACGG TCTCAGTCAC GTGATGGTTT 

1001 TGCTTTTATT TCTTAAACGG CTGAGTGATA ATCCAGCTAG TGTGCAGTCA 

1051 TTTCATACCT TTCAATGGGC GTCACCGCAG TGACGCTGCC CCAGCCCCAT 

1101 GCTGAGGGCC GACACAATTC ACGGAACAGA TTCATCATAT TTGGTCTTTA 

1151 TGTAAATAAT AAATGTTTTA AAATTGCCTA AATATAAAAA AAAAAAAAAA 

1201 AAAAAAAAAA AAAAAAAAAA AAAGGGCGGC CGAAAAAAAA AAAAAAAAAA 

1251 AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 328 bp to 618 bp; peptide length: 97 
Category: putative protein 



1 MKKPSERGRV RRRQERVHLP SVRGTLQSGF KMQNGAYSKK KKNTLLPSLP 
51 FEWTFSLPVI PTETDPDLSC EVHVPGSEVT SLWTELTRES LPPTPSG 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_28dl4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_28dl4, frame 1 

Report for DKF2phtes3_28dl4 . 1 



[ LENGTH ) 97 

[MW] 10945.56 

[pi] 9.80 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 12.37 % 



SEQ MKKPSERGRVRRRQERVHLPSVRGTLQSGFKMQNGAYSKKKKNTLLPSLPFEWTFSLPVI 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccc 

SEQ PTETDPDLSCEVHVPGSEVTSLWTELTRESLPPTPSG 

SEG 

PRD ccccccccceeeecccccchhhhhhhhhhcccccccc 



Prosite for DKFZphtes3_28dl4 . 1 
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CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_28dl4 . 1) 
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DKFZphtes3_2all 
group: testes derived 

DKFZphtes3 2all encodes a novel 1048 amino acid protein with very weak similarity to mucins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to mucin 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4082 bp 

Poly A stretch at pos. 4060, polyadenylation signal at pos. 4034 

1 GAGGACTGCG AGCACAGCGG CGGCCGGGTG GCGGGGGTGA GTGGGGCCAG 
51 CGGGGCTGGA CAGCAGCGGG CCCCGGGCGC CGCCGCCGCG ATCCCTCCCC 

101 GCGCCCGCCG AGCACATCGC CGCCGCCGAG ATGGGCCCTC CGCGGCACCC 

151 CCAGGCCGGC GAGATAGAAG CGGGCGGTGC GGGCGGCGGG CGGCGGCTAC 

201 AGGTGGAAAT GAGTTCTCAA CAGTTTCCTC GGTTAGGAGC CCCTTCTACC 

251 GGGCTGAGCC AGGCCCCTTC TCAGATTGCA AACAGTGGTT CTGCTGGATT 

301 GATAAACCCA GCTGCTACAG TCAATGATGA ATCTGGTCGA GATTCTGAAG 

351 TCAGTGCCAG GGAGCACATG AGTTCCAGCA GCTCCCTCCA GTCCCGGGAG 

401 GAGAAGCAAG AGCCTGTTGT GGTAAGGCCC TATCCACAGG TGCAGATGTT 

451 GTCGACACAC CATGCTGTCG CATCAGCCAC ACCTGTTGCA GTGACAGCCC 

501 CGCCAGCACA CCTGACGCCA GCAGTGCCAC TTTCATTTTC GGAGGGACTT 

551 ATGAAGCCGC CCCCGAAGCC CACCATGCCT AGCCGTCCCA TTGCTCCTGC 

601 TCCACCTTCT ACCCTGTCAC TTCCCCCCAA GGTTCCAGGG CAGGTTACCG 

651 TTACCATGGA GAGTAGCATC CCTCAAGCTT CAGCCATTCC TGTGGCAACA 

701 ATCAGTGGAC AACAGGGCCA TCCCAGTAAC CTGCATCACA TCATGACTAC 

751 AAATGTGCAA ATGTCTATCA TCCGCAGCAA TGCTCCTGGG CCCCCTCTTC 

801 ACATTGGAGC TTCTCATTTA CCTCGAGGTG CAGCTGCTGC TGCTGTGATG 

851 TCCAGTTCTA AAGTAACCAC AGTCCTGAGG CCGACCTCAC AGCTGCCAAA 

901 TGCTGCTACT GCTCAGCCAG CAGTACAGCA CATCATTCAC CAACCAATCC 

951 AGTCTCGGCC ACCTGTGACC ACCTCCAATG CCATCCCTCC TGCTGTGGTA 
1001 GCAACTGTCT CAGCCACCAG AGCTCAGTCT CCAGTCATCA CTACGACAGC 
1051 GGCGCATGCT ACT G ATT C AG CACTTAGTAG GCCAACCTTG TCTATCCAGC 
1101 ATCCTCCATC TGCAGCAATC AGTATTCAGC GTCCTGCCCA GTCACGAGAT 
1151 GTCACAACAA GAATCACACT ACCATCTCAC CCTGCATTAG GGACGCCAAA 
1201 ACAGCAGCTT CATACAATGG CTCAGAAAAC AATCTTCAGT ACTGGCACGC 
1251 CAGTGGCTGC AGCCACAGTA GCACCTATTT TGGCAACCAA CACCATTCCT 
1301 TCAGCGACCA CAGCTGGATC TGTGTCACAC ACGCAAGCTC CCACAAGTAC 
1351 CATTGTTACC ATGACAGTAC CCTCCCATTC CTCCCATGCT ACTGCTGTGA 
1401 CCACCTCAAA CATCCCAGTC GCCAAGGTGG TGCCCCAGCA GATCACGCAC 
1451 ACTTCTCCTC GGATCCAGCC AGACTACCCT GCCGAGAGGA GTAGCCTGAT 
1501 TCCCATCTCC GGACATCGGG CCTCTCCCAA TCCTGTGGCC ATGGAAACCC 
1551 GAAGTGACAA CAGACCGTCT GTTCCCGTTC AGTTCCAATA TTTTTTGCCA 
1601 ACTTACCCCC CTTCTGCATA CCCACTGGCG GCACATACCT ACACCCCAAT 
1651 CACCAGTTCC GTGTCCACTA TCCGACAGTA TCCAGTTTCA GCTCAGGCTC 
1701 CAAACTCTGC CATCACAGCT CAGACTGGTG TTGGGGTAGC GTCTACCGTC 
1751 CACCTAAACC CCATGCAGTT GATGACAGTG GATGCATCGC ATGCTCGACA 
1801 TATTCAAGGG ATCCAGCCAG CACCCATCAG TACCCAGGGT ATCCAGCCGG 
1851 CCCCCATTGG GACCCCAGGG ATACAGCCTG CACCACTTGG CACACAGGGA 
1901 ATTCACTCAG CAACCCCAAT CAACACACAA GGGCTTCAGC CTGCACCTAT 
1951 GGGTACTCAG CAGCCTCAGC CTGAAGGAAA GACTTCAGCA GTGGTGTTGG 
2001 CAGATGGAGC CACAATTGTG GCCAACCCTA TTAGCAATCC ATTCAGTGCT 
2051 GCTCCAGCAG CAACAACCGT GGTGCAGACC CACAGCCAGA GTGCTAGCAC 
2101 CAACGCTCCC GCCCAGGGCT CATCGCCACG GCCAAGCATA CTCCGGAAGA 
2151 AACCTGCCAC AGATGGTGCC AAACCCAAGT CTGAAATCCA CGTGTCTATG 
2201 GCCACTCCGG TCACTGTGTC CATGGAGACT GTATCCAATC AAAATAATGA 
2251 TCAGCCTACC ATTGCCGTCC CTCCAACTGC CCAGCAGCCC CCACCGACCA 
2301 TTCCAACTAT GATTGCAGCA GCCAGTCCCC CGTCACAACC AGCCGTTGCC 
2351 CTTTCAACCA TTCCTGGAGC GGTCCCCATC ACTCCACCCA TCACCACCAT 
2401 TGCAGCTGCA CCACCTCCAT CAGTCACTGT GGGTGGCAGT CTTTCCTCCG 
24 51 TCTTGGGCCC TCCCGTTCCT GAAATTAAAG TGAAAGAAGA AGTAGAACCA 
2501 ATGGATATCA TGAGGCCAGT TTCTGCAGTT CCTCCACTGG CTACCAACAC 
2551 TGTGTCTCCA TCTCTTGCAT TGCTGGCAAA CAACTTGTCC ATGCCTACAA 
2601 GTGACCTACC ACCTGGTGCC TCCCCAAGGA AAAAGCCTCG AAAGCAACAG 
2651 CATGTGATCT CAACAGAAGA AGGTGACATG ATGGAGACAA ACAGCACTGA 
2701 TGATGAGAAG TCCACTGCCA AGAGTCTTCT GGTGAAGGCT GAGAAGCGCA 
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2751 AGTCTCCTCC CAAGGAGTAT ATTGATGAGG AAGGTGTGAG ATATGTCCCA 
2801 GTGCGTCCAA GACCCCCCAT TACTTTGCTT CGTCACTATC GGAACCCCTG 
2851 GAAAGCTGCT TACCACCACT TTCAGAGGTA CAGTGACGTC CGGGTCAAAG 
2901 AGGAGAAGAA AGCTATGCTG CAGGAAATAG CTAATCAGAA AGGAGTATCC 
2951 TGTCGTGCTC AAGGCTGGAA AGTCCACCTC TGTGCTGCCC AGTTACTACA 
3001 GCTGACGAAT CTAGAACATG ATGTCTATGA AAGACTTACT AACCTGCAGG 
3051 AAGGGATTAT CCCAAAGAAA AAAGCAGCAA CAGATGATGA TCTCCACCGA 
3101 ATAAACGAAC TGATACAGGG AAATATGCAG AGGTGTAAAC TTGTGATGGA 
3151 TCAAATCAGT GAAGCCAGAG ACTCCATGCT TAAGGTTTTA GATCATAAAG 
3201 ACCGTGTCCT GAAGCTGCTT AACAAGAACG GGACTGTCAA AAAAGTGTCC 
3251 AAATTGAAGC GAAAGGAAAA AGTCTAGACC CAGAACAATC AGGAGATTGG 
3301 AAGCAAATTT ATGAAGAATG ATGGTGGGGG TGGGGGGAGG GTTTTGGTTT 
3351 TTTCCAAAGT GGAACATTGA AATAAAGGAA GTGTTCCTTA GTTCCCGTGT 
3401 GAAAGCAGAG GAACCCATGA CATCCAAGGG CGTGAAAGGA TCAGAGCTGA 
3451 CTGGACATAG TGAGCTGCCT TCTTGCGTTC GGGTGCACCC CTGTTAAACC 
3501 TGATCTGTGT CATAAGTGAC TCCGGATGCA TCAGTGTCCA CCAGTTGGAA 
3551 GCAATGACAA GGATGGCTGG CTGGTGTTTT TCAGCCTTCC GGTTTATAGA 
3601 CTGTATTTAT CTAGTGGATT CCTGCAGGCC CCATACTGAG CCTGGACTGA 
3651 AAGTATCCAC TCGGACCATC TGTTATCTCT CTACACTGAA AATAAAACCT 
3701 CTTCCACCCA CCCCATTCGG TTCTTCTGCC TGACCTTCAA ATGCCCATGT 
3751 TGGCCTTTTA CAGCAGTGCC ACGGCACCAA GCGAGCTGCC ACATCTCACA 
3801 CTCTAAAGGG TTTGAACTAT TAGTTCTTGT CATTTTTTAA AAAAAACCAT 
3851 TCCCAAGTGA AATTGTTATA TCGTCTGTCT TGCGTGTGTC AGAACTGGGT 
3901 TTTTGTGGAG GTTCAGAGCA GGCAACACCA TAAGTTGCTC TCAGATCCTT 
3951 GTTCTGAAGT ACATTCTTGG TTATCTGTAC TTCTGTAGCT GGTGTGATGC 
4001 TGTTAATTGT ATGTACCACA CATCTCCAGA CGTTAATAAA GGACTCAAAG 
4051 AGGTTTTTGT AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 131 bp to 3274 bp; peptide length: 1048 
Category: similarity to known protein 



1 MGPPRHPQAG EI EAGGAGGG 

51 NSGSAGLINP AATVNDESGR 

101 YPQVQMLSTH HAVASATPVA 

151 SRPIAPAPPS TLSLPPKVPG 

201 LHHIMTTNVQ MSIIRSNAPG 

251 PTSQLPNAAT AQPAVQHTIH 

301 PVITTTAAHA TDSALSRPTL 

351 PALGTPKQQL HTMAQKTIFS 

401 TQAPTSTIVT MTVPSHSSHA 

451 AERSSLIPIS GHRASPNPVA 

501 AHTYTPITSS VSTIRQYPVS 

551 DASHARHIQG IQPAPISTQG 

601 GLQPAPMGTQ QPQPEGKTSA 

651 HSQSASTNAP AQGSSPRPSI 

701 VSNQNNDQPT IAVPPTAQQP 

751 TPPITTIAAA PPPSVTVGGS 

801 PPLATNTVSP SLALLANNLS 

851 METNSTDDEK STAKSLLVKA 

901 RHYRNPWKAA YHHFQRYSDV 

951 CAAQLLQLTN LEHDVYERLT 

1001 RCKLVMDQIS EARDSMLKVL 



RRLQVEMSSQ QFPRLGAPST GLSQAPSQIA 
DSEVSAREHM SSSSSLQSRE EKQEPVVVRP 
VTAPPAHLTP AVPLSFSEGL MKPPPKPTMP 
QVTVTMESSI PQASAIPVAT ISGQQGHPSN 
PPLHIGASHL PRGAAAAAVM SSSKVTTVLR 
QPIQSRPPVT TSNAIPPAVV ATVSATRAQS 
SIQHPPSAAI SIQRPAQSRD VTTRITLPSH 
TGTPVAAATV APILATNTIP SATTAGSVSH 
TAVTTSNIPV AKVVPQQITH TSPRIQPDYP 
METRSDNRPS VPVQFQYFLP TYPPSAYPLA 
AQAPNSAITA QTGVGVASTV HLNPMQLMTV 
IQPAPIGTPG IQPAPLGTQG IHSATPINTQ 
VVLADGATIV ANPISNPFSA APAATTVVQT 
LRKKPATDGA KPKSEIHVSM ATPVTVSMET 
PPTIPTMIAA ASPPSQPAVA LSTIPGAVPI 
LSSVLGPPVP EIKVKEEVEP MDIMRPVSAV 
MPTSDLPPGA SPRKKPRKQQ HVISTEEGDM 
EKRKSPPKEY IDEEGVRYVP VRPRPPITLL 
RVKEEKKAML QEIANQKGVS CRAQGWKVHL 
NLQEGI IPKK KAATDDDLHR INELIQGNMQ 
DHKDRVLKLL NKNGTVKKVS KLKRKEKV 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2al 1 , frame 2 

SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2}., 
Score - 334, P - 2.4e-25 
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PIR:A43932 mucin 2 precursor, intestinal - human (fragments), N » 1, 
Score - 321, P - 3.2e-24 

TREMBL:D88440_1 product: "high molecular mass nuclear antigen"; Gallus 
gallus mRNA for high molecular mass nuclear antigen, partial cds., N = 
1, Score - 312, P - 8.3e-24 

PIR:S48478 glucan 1 , 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N - 1, Score = 300, P = 2.1e-22 



>SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). 
Length » 5,179 

HSPs: 

Score = 334 (50.1 bits), Expect = 2.4e-25, P = 2.4e-25 
Identities = 184/770 (23%), Positives » 263/770 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V pp T + + TVTP TP + +PPPT P 
Sbjct: 3471 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3530 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3531 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3589 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p p+ + P +++ +TT T TP I 

Sbjct: 3590 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3649 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3650 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3706 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 3707 PTGTQTPTTT P I TTTTT VT PTPT PTGTQT PTTT PI TTTTTVT PT PTPTGTQT PTTT P I TT 3766 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 4 43 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3767 TTTVTPT PTPTGTQ/T PTTT PITTTTTVT PTPT PTGTQ/TPTTT PI TTTTTVT PTP-TPTGT 3825 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P p +T + + P+ + PT P+ 

Sbjct: 3826 QTPTTT PI TTTTTVT PTPT PTGTQT PT TTP I TTTTTVT PT PT PTG — TQTP 3874 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 3875 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3932 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3933 PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVT PTPT PTGTQT PTT 3991 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 3992 TPI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITTTTTVT PTPT 4051 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP— -PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 4052 PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPITT 4111 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPS VTVGGSLSSVLGP-PVPEI 782 

p+ TP PIT TT+ PP+ T ++++PPP 

Sbjct: 4112 TTTVT PT PTPTGTQT- PTTT P ITTT -TTVTPT PTPTGTQT PTTT P I TTTTTVT PT PT PTG 4169 

Query: 783 KVKEEVEPMDIMRPVSAVP-PLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

P+ V+ P P T T P+ A + TS+ PP +S + R 

Sbjct: 4170 TQTPTTTPITTTTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSS 4229 

Query: 842 VISTEEGDMMET 853 

+ TE ++ T 
Sbjct: 4230 PL-TESTTLLST 4240 

Score =■ 328 (49.2 bits), Expect - 1.0e-24, P = 1.0e-24 
Identities = 180/745 (24%), Positives = 254/745 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 3540 VT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTP I TTTTT VTPT PTPTGTQT PT 3599 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3600 TT PITTTTT VT PTPT PTGTQT PTTTPITTTTTVT PTPTPTGTQT- PTTTPITTTTTVTPT 3658 

Query: 213 IIRSNAPGP— -PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3659 PT PTGTQT PTTTPITTTTTVT PTPTPTGTQT PTTTPITTTTTVTPT PTPTGTQTPTTTPI 3718 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T+T+PTT T + T++ P 
Sbjct: 3719 TTTTTVTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3775 

Query: 329 A I S I QR PAQS R DVT TR I TLPSHPALGTP KQQL H TMAQKT - 1 FS TGT PV AAAT - - V A P I LA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3776 PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT PTPTGTQTPTTTPITT 3835 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3836 TTTVTPTPT PTGTQT PTTTPITTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3894 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p P +T + + P+ + PT P+ 

Sbjct: 3895 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3943 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T Q T 

Sbjct: 3944 TTTPITTTTTVTPTPTPTGTQTPTTTPI TTTTTVTPT PTP — TGTQTPTTTPITTTTTVT 4001 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4002 PTPT PTGTQT PTTTP I TTTTTVT PTPT PTGTQ-TPTTT PI TTTTTVTPT PT PTGTQT PTT 4060 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 4061 TPITTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 4120 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T+ TPPTQPTP 

Sbjct: 4121 PTGTQT PTTT PI TTTTTVTPT PT PTGTQT PTTTP I TTTTTVTPT PTPTGTQTPTTTPITT 4180 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAA-PPPSVTVGGSLSSVLGPPVPEIKVKEE 787 

p+ TP T PI + + PPP + + S P + 

Sbjct: 4181 TTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSSPLTESTTLLST 4240 

Query: 788 VEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMP — TSDLPPGASPR 833 

+ P M S PP +T T +P+ + LS P T+ PPG R 

Sbjct: 4241 LPPAIEM — TSTAPP-STPT-APTTTSGGHTLSPPPSTTTSPPGTPTR 42B4 

Score - 325 (48.8 bits), Expect = 2.2e-24, P - 2.2e-24 
Identities = 186/782 (23%), Positives - 261/782 (33%) 

Query: 96 WVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3494 VT PTPT PTGTQT PTTTP I TTTTTVTPTPT PTGTQT PTTTPI TTTTTVT PTPTPTGTQTPT 3553 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT + P T +G Q P+ TT V + 

Sbjct: 3554 TT PI TTTTTVT PTPTPTGTQT PTTTPI TTTTTVT PTPT PTGTQT- PTTTPI TTTTTVTPT 3612 

Query: 213 IIRSNAPGP— PLHIGASHLPRGAAAAA- VMS SSKVTTVLRPTS QLPNAATAQP A VQH I 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 3613 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3672 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 3673 TTTTTVTPTPT PTGTQTPTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT 3729 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 3730 PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVTPT PTPTGTQTPTTTPITT 3789 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3790 TTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQT PTTT PITTTTTVTPTP-TPTGT 3848 

Query: 4 44 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3849 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3897 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
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T TPIT++ +T PQP+ITTV T QT 
Sbjct: 3898 TTT PITTTTTVTPTPT PTGTQT PTTT PI TTTTTVTPTPTP--TGTQT PTTT PI TTTTTVT 3955 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3956 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPT PTGTQT PTT 4014 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 4015 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4074 

Query: 67 2 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T p+ TP +T+ TPPTQPTP 

Sbjct: 4075 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4134 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P I V 
Sbjct: 4135 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQT PTTTPITTTTTV 4184 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQHVISTEEG 848 

P PP T+T +P L+N PSP+ P + + + 

Sbjct: 4185 TPTPTPTGTQTGPPTHTST-APIAELTTSN-PPPESSTPQTSRSTSSPLTESTTLLSTLP 4242 

Query: 849 DMMETNSTDDEKSTAKSLLVKAEKRKSPP 877 

+E ST + SPP 
Sbjct: 4243 PAI EMTSTAPPSTPTAPTTTSGGHTLS PP 4271 

Score =324 (48.6 bits), Expect = 2,6e-24, P - 2.8e-24 
Identities =■ 170/717 (23%), Positives - 248/717 (34%) 

Query: 95 P VVV R P Y PQVQM LSTHHAVASATP — VAVT A P PAH LT PA V P LS FS EGLMKP P P K PTM P S R 152 

p P p +T + +P T PP TP+ P++ + + P P+ P 

Sbjct: 1401 PPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPL-PTTTPSPPIS 1459 

Query: 153 PIAPAPPSTLSLPPKVPGQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

PP+T PP TS +PT+ P I + 

Sbjct: 1460 TTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTPITPPASTTT 1516 

Query: 213 IIRSNAPGPPLHIGASHLPRGAAAAAVMSSSKVTTVLRPTSQ--LPNAATAQPAVQHIIH 270 

+ + PPP +P S T + PTS LP TP 

Sbjct: 1517 LPPTTT PS P PTTTTTTP P P TTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTT 1571 

Query: 271 QPIQSRP-PVTTSNAIPPAVVATVSA-TRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

P + P P TT+ PP+T T SP TTT + S PT + PP++ 

Sbjct: 1572 PPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTS 1631 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNT 388 

+ + T T P P TP T I +T TP T + + T 

Sbjct: 1632 TTTLPPTTTPSPPPTTTTTP— PPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTTP 1689 

Query: 389 IPSATTAGSVSHTQAPTSTI VTMTVPSHSSHATAV-TTSNIPVAKVVPQQITHTSPRIQP 447 

P TT + S T P+S ITTPS+++ TT P P TT +P 

Sbjct: 1690 SPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 174 9 

Query: 4 48 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPV-QFQYFLPTYPPSAY-P .--LA 500 

+ + p+ P T + P VP+ + +L + P+ + P L 

Sbjct: 1750 TTTSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELI 1809 

Query: 501 AHTYTPITSSVSTIR — QYP-VSAQAPNSAITAQTGVG-VASTVHLNPMQLMTVDASHAR 556 

P ++ + R YP V + VG + P ++ + A 

Sbjct: 1810 GDVCGPGWAANISCRATMYPDVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPM-AFCLN 1868 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQ-PAPLGTQGIHSATPINTQGLQPAPMGTQQPQ-- 613 

+ +Q TQ P+T + PPTI+T + PP GTQ P 

Sbjct: 1869 YEINVQCCECVTQ PTTMTTTTTEN PT P PTTT P ITTTTTVTPT PT PTGTQT PTTT 1922 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 1923 PITTTTTVTPTPTPTGTQTPTTT PITTTTTVTPTPT PTGTQT PTTTPITTTTTVTPTPTP 1982 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI A 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 1983 TGTQT PTTT PITTTTTVTPTPT PTGTQT PTTT PITTTTTVTPTPTPTGTQTPTTT PI TTT 2042 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ TP PIT TT PP+T G+ + P V 
Sbjct: 2043 TTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT- -GTQT PTTTPITTTTTVTPTPT 2096 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2097 PTGTQT PTTT- PITTTTTVTPT 2117 
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Score = 318 (47.7 bits), Expect - 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives » 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V pp T ++TVTP TP + +PPPT P 
Sbjct: 2068 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2127 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2128 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2186 

Query: 213 IIRSNAPGP— -PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 2187 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2246 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2247 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2303 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2304 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQT PTTTPITT 2363 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2364 TTTVTPTPTPTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPITT TTTVTPTP-TPTGT 2422 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2423 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2471 

Query: 503 TYTPITSSVS-TIRQVPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2472 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 2529 

Query: 561 IQPAPISTQGIQPAPIGTPGI-. — QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2530 PTPT PTGTQT PTTTPITTTTTVT PTPTPTGTQ-T PTTTPITTTTTVT PTPT PTGTQT PTT 2588 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2589 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2648 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 264 9 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2708 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2709 TTTVTPTPTPTGTQT-PTTTPIT TTTTVT PTPT PT — GTQT PTTT PI TTTTTVTPT P 27 62 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2763 TPTGTQTPTTT- PI TTTTTVTPT 2784 

Score o 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities - 174/717 (24%) , Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 2206 VT PT PT PTGTQT PTTT P I TTTTT VT PT PT PTGTQT PTTTPI TTTTT VT PT PT PTGTQT PT 2265 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2266 TTPITTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPT PTGTQT- PTTTPITTTTTVTPT 2324 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + p +++ +TT T TP I 

Sbjct: 2325 PT PTGTQT PTTT P I TTTTT VT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI 2384 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 2385 TTTTTVT PTPTPTGTQT PTTTPI TTTTT VTPTPTPTGTQTPTTTPITTTTTVT PTPT 2441 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2442 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT PTPTGTQTPTTTPITT 2501 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 
T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 
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Sbjct: 2502 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2560 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p p +T + + P+ + PT P+ 

Sbjct: 2561 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG- -TQTP 2609 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAC/TGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 2610 TTTP I TTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2667 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2668 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2726 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2727 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 27B6 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T + T P PT Q P. T P 

Sbjct: 2787 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2846 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2847 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT--GTQTPTTTPITTTTTVTPTP 2900 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2901 TPTGTQTPTTT - PITTTTTVTPT 2922 

Score - 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives » 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T++TVTP TP ++PPPT P 

Sbjct: 2321 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTP I TTTTTVT PTPTPTGTQTPT 2380 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2381 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVT PT 2439 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2440 PTPTGTQTPTTT PITTTTTVTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2499 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T + + P 
Sbjct: 2500 TTTTTVTPT PTPTGTQTPTTT PITTTTTVTPT PTPTGTQTPTTT PI TTTTTVT PTPT 2556 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2557 PTGTQTPTTT PITTTTTVTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2616 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2617 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2675 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2676 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2724 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 2725 TTTPITTTTTVTPTPTPTGTQTPTTT PI TTTTTVT PTPTP — TGTQTPTTTPITTTTTVT 2782 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL— QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2783 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2841 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSII. 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2842 TPI TTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTP ITT TTTVTPTPT 2901 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2902 PTGTQTPTTTP I TTTTTVTPT PTPTGTQTPTTT PI TTTTTVT PTPT PTGTQTPTTT PITT 2961 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2962 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT--GTQTPTTTPITTTTTVTPTP 3015 
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Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

p P + P T TV+P+ 

Sbjct: 3016 TPTGTQTPTTT-PITTTTTVTPT 3037 

Score - 318 (47.7 bits). Expect - 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V pp T + + TVTP TP + +PPPT P 
Sbjct: 2390 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2449 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2450 TT P I TTTTT VT PT PT PTGTQT PTTT P I TTTTTVTPT PT PTGTQT- PTTT P I TTTTT VT PT 2508 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P ++ + +TT T T P I 

Sbjct: 2509 PT PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQT PTTTPI 2568 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T++ P 
Sbjct: 2569 TTTTT VTPTPTPTGTQTPTTT PI TTTTT VTPTPT PTGTQT PTTT PI TTTTT VT PTPT 2625 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2626 PTGTQT PTTT P ITTTTTVT PTPT PTGTQT PTTT P I TTTTTVT PT PTPTGTQT PTTT PITT 2685 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2686 TTTVTPT PTPTGTQT PTTT P ITTTTTVT PT PT PTGTQTPTTT P I TTTTTVTPT P -T PTGT 2744 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ +■ +P P +T + + P+ + PT P+ 

Sbjct: 274 5 QT PTTT PI TTTTTVT PTPT PTGTQT PT TT PI TTTTTVT PTPTPTG — TQTP 2793 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 2794 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTT PI TTTTTVT 2851 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPA PMGTQQPQ- "613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2852 PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQ-TPTTT PI TTTTTVT PTPT PTGTQT PTT 2910 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2911 TPITTTTTVT PTPT PTGTQT PTTT PITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2970 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2971 PTGTQT PTTTPITTTTTVT PTPTPTGTQTPTTTPI TTTTTVTPT PTPTGTQT PTTT PITT 3030 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3031 TTTVTPTPT PTGTQT- PTTT PIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3084 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3085 TPTGTQTPTTT-PITTTTTVTPT 3106 

Score - 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities - 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLS FSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2459 VTPTPTPTGTQT PTTTPITTTTTVT PTPTPTGTQTPTTTPI TTTTTVT PTPT PTGTQT PT 2518 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2519 TTP I TTTTTVT PT PT PTGTQT PTTTP I TTTTTVTPT PT PTGTQT - PTTT P I TTTTTVTPT 2577 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p P+ + P +++ +TT T TPI 

Sbjct: 2578 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2637 

Query: 269 IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2638 TTTTTVT PTPT PTGTQT PTTTP I TTTTTVTPTPT PTGTQT PTTTPITTTTTVT PTPT 2694 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 2695 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2754 
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Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

SbjCt: 2755 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2813 

Query: 4 44 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p P +T + + P+ + PT P+ 

SbjCt: 2814 QTPTTTPITTTTTVT PTPTPTGTQTPT TT PI TTTTT VT PT PT PTG — TQTP 2862 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 2863 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2920 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

SbjCt: 2921 PTPTPTGTQTPTTTPI TTTTTVT PTPT PTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2979 

Query: 614 -PEGKTSAVVLADGATI VANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2980 TPITTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3039 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 3040 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3099 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3100 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3153 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3154 TPTGTQTPTTT-PITTTTTVTPT 3175 

Score = 318 {47.7 bits), Expect - 1.2e-23, P - 1.2e-23 
Identities - 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2528 VTPTPTPTGTQTPTTTP I TTTTTVTPTPTPTGTQTPTTT PI TTTTTVT PTPTPTGTQTPT 2587 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

SbjCt: 2588 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPI TTTTTVT PT 2646 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 264 7 PTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTT PI 2706 

Query: 269 I HQP I QS RP PVTT SNAI P PA WATVSATRAQS PVI TTTAAHATDSALS RPT LS I QH P PSA 328 

+ PT P T + T+PTT T + T + + P 
Sbjct: 2707 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTP I TTTTTVT PTPT 2763 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-I FSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2764 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI TTTTTVTPTPTPTGTQTPTTT PITT 2823 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2824 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2882 

Query: 444 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +p p +T + + P+ + PT P+ 

Sbjct: 2883 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVT PTPT PTG — TQTP 2931 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2932 TTT PI TTTTTVT PT PTPTGTQTPTTT PI TTTTTVT PTPTP — TGTQTPTTTPITTTTTVT 2989 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2990 PTPTPTGTQTPTTT PI TTTTTVT PTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3048 

Query: 614 -PEGKTSAWLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 3049 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3108 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3109 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3168 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
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p+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3169 TTT VT PT PT PTGTQT- PTTT P I T TTTTVTPTPTPT — GTQTPTTTPITTTTTVT PTP 3222 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3223 TPTGTQTPTTT-PITTTTTVTPT 3244 

Score " 318 {47.7 bits), Expect - 1.2e-23, P * 1.2e-23 
Identities - 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

VPP T+ + TVTP TP ++PPPT P 

Sbjct: 3080 VT PTPT PTGTQT PTTT P I TTTTTVT PT PTPTGTQT PTTTP I TTTTTVT PT PT PTGTQT PT 3139 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V I 

Sbjct: 3140 TTP I TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQT -PTTT PI TTTTTVTPT 3198 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3199 PTPTGTQT PTTT PI TTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTPI 3258 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQS PV I TTTAAHATDSALSRPTLS I QHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3259 TTTTTVT PTPT PTGTQT PTTT PI TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3315 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 3316 PTGTQT PTTT PI TTTTTVT PT PTPTGTQT PTTTP I TTTTTVTPT PTPTGTQT PTTTP ITT 3375 

Query: 38 6 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 337 6 TTTVTPT PTPTGTQT PTTT PI TTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTP-TPTGT 3434 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p p +T + + P+ + PT P+ 

Sbjct: 3435 QT PTTTP I TTTTTVT PT PT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 3483 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
Sbjct: 3484 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTTPI TTTTTVT 3541 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3542 PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPT PTGTQT PTT 3600 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 3601 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3660 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3661 PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 3720 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3721 TTT VT PT PT PTGTQT - PTTT P I T TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3774 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3775 TPTGTQTPTTT-PITTTTTVTPT 3796 

Score - 313 (47.0 bits). Expect = 4.2e-23, P = 4.2e-23 
Identities - 169/695 (24%), Positives - 245/695 (35%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3655 VTPT PT PTGTQT PTTT PI TTTTTVT PTPT PTGTQTPTTT P I TTTTTVT PTPT PTGTQT PT 3714 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 3715 TTPITTTTTVTPT PTPTGTQT PTTTPITTTTTVT PTPT PTGTQT- PTTT PITTTTT VTPT 3773 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 377 4 PTPTGTQT PTTT PITTTTTVTPT PTPTGTQT PTTT PI TTTTTVTPTPTPTGTQT PTTT PI 3833 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3834 TTTTTVTPT PT PTGTQT PTTT P ITTTTTVT PT PT PTGTQT PTTTP ITTTTTVT PTPT 3890 
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Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

SbjCt: 3891 PTGTQT PTTT P I TTTTTVT PT PTPTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TT 3950 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

SbjCt: 3951 TTTVTPT PTPTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTTPITTTTTVTPTP-TPTGT 4009 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +P P +T + + P+ + PT P+ 

SbjCt: 4010 QTPTTTPITTTTTVT PTPT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 4058 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 
SbjCt: 4059 TTTPITTTTTVTPTPT PTGTQT PTTTPITTTTTVTPTPTP — TGTQTPTTTP I TTTTTVT 4116 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

SbjCt: 4117 PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPT- 4174 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKK 674 

T+ + T+ P P T ++ ++N P + S+P+ S 

SbjCt: 4175 TTPITTT--TTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSS 4229 

Query: 675 PATDGAKPKSEIH— VSMATPVTVSMETVSNQNNDQPTIAVPP-TAQQPP— PTIPTMIA 729 

PT+ S++M+ ST + T++ PP T PP PT T 

SbjCt: 4230 PLTESTTLLSTLPPAIEMTSTAPPSTPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTG 4289 

Query: 730 AASPPSQPAVALSTI PGAVPITPP--ITTIAAAP-PPSVTVGGSLSSVLGPPVPEI 782 

++S P+ V +T P P++ PIT P P SV + L+ P E+ 

SbjCt: 4290 SSSAPTPSTVQTTTTSAWTPTPTPLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV 4349 

Score - 279 (41.9 bits), Expect = 1.8e-19, P = 1.8e-19 
Identities - 138/540 (25%), Positives - 194/540 (35%) 

Query: 278 PVTTSNAIPPAVVATVSATRAQSPVITTTAAH ATDSALSRP — TLSIQHPPSAA 329 

P+TT+ + P T + T +P+ TTT T + + P T + P 

Sbjct: 194 6 PITTTTTVT PTPT PTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTTPITTTTTVTPTPTP 2005 

Query: 330 ISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILAT 386 

Q P + TT P+ GT + T + T TP T PI T 

SbjCt: 2006 TGTQT PTTTPITTTTTVTPT PTPTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTTPITTT 2065 

Query: 387 NTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKWPQQITHTSPR 444 

T+ P+ T G+ + T P +T T+T P++ TTT VP TT+ 

SbjCt: 2066 TTVT PTPT PTGTQT PTTTPITTTTTVTPTPT PTGTQT PTTTPITT TTT VTPTP-TPTGTQ 2124 

Query: 445 IQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHT 503 

p ++ + +P p +T + + P+ + PT P+ T 

SbjCt: 2125 T PTTT PI TTTTTVT PTPT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTPT 2173 

Query: 504 YTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGI 561 

TPIT++ +T PQP+ITTV T QT 
SbjCt: 2174 TT PI TTTTTVT PTPTPTGTQT PTTTPITTTTTVTPTPTP — TGTQT PTTT P I TTTTTVTP 2231 

Query: 562 QPAPISTQGIQPAPIGT PGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ-- 613 

p p TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2232 TPT PTGTQT PTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVT PTPT PTGTQT PTTT 2290 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2291 PI TTTTTVT PT PT PTGTQT PTTT P I TTTTTVTPT PT PTGTQT PTTT PI TTTTTVT PT PT P 2350 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMIA 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2351 TGTQTPTTTPITTTTTVT PTPTPTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTTPITTT 2410 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

p+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2411 TTVT PTPT PTGTQT -PTTT PIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPT 2464 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2465 PTGTQT PTTT -PI TTTTTVTPT 2485 

Score - 265 (39.8 bits), Expect =• S.8e-18, P - 5.8e-18 
Identities = 179/746 (23%), Positives = 257/746 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3678 VT PTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQT PTTTPITTTTTVTPT PTPTGTQTPT 3737 
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Query: 155 A-PAPPSTLSLPPKVp-GQVTVTMESSIPQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3738 TTPITTtTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3796 

Query: 213 IIRSNAPGP— -PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3797 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3856 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3857 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3913 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3914 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3973 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 4 43 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3974 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4032 

Query: 444 RIQPDYPAERSSLIPISGHPASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p ++ + +p P +T + + P+ + PT P + 

Sbjct: 4033 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 4081 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 4082 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4139 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4140 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTGPP 4198 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPA-— ATTVVQTHSQSA-STNAPA--QGSSPRP 668 

TS +A+ T +NP P S+ P +T+ T S + ST PA S+ P 
Sbjct: 4199 T-HTSTAPIAELTT--SNP--PPESSTPQTSRSTSSPLTESTTLLSTLPPAIEMTSTAPP 4253 

Query: 669 SILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMI 728 

S TG S + +P +++PT+T TPT 

Sbjct: 4254 STPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTSAWT-PTPT 4312 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

++p L P +V I + AP V G+ + E 

Sbjct: 4 313 PLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV-YNGTYGDTCYFVNCSLSCTLEF 4371 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

S P + +T +PS ++ S PT P P P +Q++ 
Sbjct: 4372 YNWSCPSTPSPTPTPSKSTPTPSKP — SSTPSKPTPGTKPPECPDFDPPRQEN 4422 

Score - 254 (38.1 bits), Expect ~ 8.7e-17, P = 8.7e-17 
Identities - 167/697 (23%), Positives - 245/697 (35%) 

Query: 115 SATPVAVTAPPAHLTPAVPLSFSEGLMKPPPK--PTMPSR-PIAPAPPSTLSLPPKV-PG 170 

S + T PP TP+ P + + PPP P+ P+ PI P P ST +LPP P 

Sbjct: 1587 SPPTITTTTPPPTTTPSPPTTTTT TPPPTTTPSPPTTTPITP-PTSTTTLPPTTTPS 1642 

Query: 171 QVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHL 230 

T + P + PT+ + TT I + PPP + 

Sbjct: 1643 PPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPI--TTTPSPPTTTMTTPS 1700 

Query: 231 PRGAAAAAVMSSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQS-RPPVTTSNAIPPAV 289 

P SS +TT P+S + P P + PP TT +PP 

Sbjct: 1701 P TTTPSSPITTTTTPSS TTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPPTT 1751 

Query: 290 VATVSATRAQSPVITT-TAAHATDSALSRPTLSIQH PPSAAISIQRPAQSRDVTTR 344 

++ t PITT++++P+++ S ++P ++ 

Sbjct: 1752 TSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELIGD 1811 

Query: 345 ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATN TIPSATTAGS 397 

+ PA++++ IGV ++N IP A 

Sbjct: 1812 VCG PGWAAN I SC RATM Y P--DVPIGQLGQTVVCDVSVGLICKNE DQK PGG V I PMA FC LN Y 1869 

Query: 398 VSHTQAPTSTI--VTMTVPSHSSHATAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSS 455 

+ Q TMT + + + T TT+ I V T T + P ++ 

SbjCt: 1870 EINVQCCECVTQPTTMTTTT-TENPTPPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTT 1928 

Query: 456 LIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHTYTPITSSVS-T 513 

+ +P P +T + + P+ + PT P+ T TPIT++ + T 

Sbjct: 1929 TVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG- -TQTPTTTPITTTTTVT 1977 

Query: 514 IRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQ 572 
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Sbj ct : 


1978 




573 


Sb jet : 


2036 


Query: 


625 


Sbjct: 


2095 


Query: 


684 


Sbjct: 


2155 


Query: 


741 


Sbjct: 


2215 


Query: 


801 


Sbjct: 


2268 


Score 


- 243 


Identities « 


Query: 


121 


Sbjct: 


1396 


Query: 


180 


Sbjct: 


1453 


Query: 


240 


Sbjct: 


1499 


Query: 


296 


Sbjct: 


1559 


Query: 


355 


Sbjct: 


16X7 


Query: 


415 


Sbjct: 


1676 


Query: 


474 


Sbjct: 


1731 


Score 


- 189 


Identities » 


Query: 


439 


Sbjct: 


1398 


Query: 


498 


Sbjct: 


1457 


Query: 


557 


Sbjct: 


1517 


Query: 


617 


Sbjct: 


1567 


Query: 


675 


Sbjct: 


1621 


Query: 


732 


Sbjct: 


1679 



PQP+ITTV T QT PPTQ 

PTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2035 

PAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ--PEGKTSAVVLA 624 

PI T P P GTQ + TPI T P P GTQ P P T+ V 

TTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT 2094 

DGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPK 683 

T P+P+ TT T +Q+ +T ++ p+ TP 

PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2154 

SEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMIAAASPPSQPAVA 740 

+ TP +T + T P PT Q P T P P+ 

TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTG 2214 

LSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAV 800 

T P PIT TT P P+ T G+ + P V P P + 

TQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPTPTGTQTPTTT- 2267 

PPLATNTVSPS 811 
P T TV+P+ 
PITTTTTVTPT 2278 

(36.5 bits), Expect - 1.3e-15, P - 1.3e-15 
■ 110/406 (27%), Positives - 154/406 (37%) 

VTAP-PAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESS 179 
+T P P TP+ P + + L P P+ P+ PP+T PP T + ++ 

ITTPSPPTTTPSPPPTTTTTL-PPTTTPSPPTTTTTTPPPTTTPSPPITT — TTTPLPTT 1452 

IPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHLPRGAAAAAV 239 

p P++T + P+ TT + P PP + P 
TPS p PISTTTTPP — PTTTPSPPTTTPSPP TTTPSPPTTTTTTPPP TT 1498 

MSSSKVTTVLRP— TSQLPNAATAQPAVQHI IHQPIQSRP-PVTTSNAI PPAVVATVSA 295 

S +TT +P T+LP TP P+PP TT+ PP T+ 

TPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPP 1558 

TRAQSPVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDV-TTRITLPSHPALG 354 
T SP TTT + S PT + PP+ + P + TT T P P 

TTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTP — PPTT 1616 

TPKQQLHTMAQKTIFSTGTPVAAATVAPILATNTI PSATTAGSVSHTQAPTSTI VTMTVP 414 
TP T +T P T +P T T P TT S T P+ I T T P 

TPSPPTTTPITPPTSTTTLP-PTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTP 1675 

SHSSHATA-VTTSNIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMET 473 

++ ++ +TT+ P + TSPPP++PS SPPMT 
PPTTTPSSPITTTPSPPTTTM TTPSPTTTPSSPITTTTT-PSSTTTPSPPPTTMTT 1730 

RSDNR-PSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNS 526 

S PS P LP S+ PL T TP+ S++ PS P + 

PSPTTTPSPPTTTMTTLPPTTTSS-PL TTTPLPPSITPPTFSPFSTTTPTT 1780 

(28.4 bits), Expect « 8.0e-09, P = 8.0e-09 
» 92/374 (24%), Positives - 133/374 (35%) 

THTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYF-LPTYPFSAY 497 
T + P P P ++ +P + + P PS P+ LPT PS 

TPSPPTTTPSFPPTTTTTLPPT-TTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSP- 1456 

PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 
p++ T P T++ S P S T T +T PM +T AS 

PISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTT 1516 

HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P +T P P TP +P T I P +T L P T P P 
LPPTTTPSP PTTTTTT P P PTTT P SPPTTTPI — TPPTSTTTLPP TTTPSPPP 1566 

KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAP--AQGSSPRPSILRKK 674 

T+ T +P P + P+ T+ T +T +P ++P P+ 

TTTTT PPPTTTPSP PTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSP 1620 

PATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAV-PPTAQQPPPTIPTMIAA — A 731 
PT P+ + PT + PT PPT P P I T 

PTTTPITPPTS--TTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPT 1678 

SPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPV PEIKVK 785 

+ PS P + P TP TT ++P + T S ++ PP P 

TTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPS 1738 
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Query: 786 EEVEPMDIMRPVSAVPPLATNTVSPSL 812 

M + P + PL T + PS+ 
Sbjct: 1739 PPTTTMTTLPPTTTSSPLTTTPLPPSI 1765 

Score - 185 (27.8 bits), Expect - 1.6e-09, P - 1.6e-09 
Identities - 71/270 (26%), Positives » 99/270 (36%) 

Query: 563 PAPISTQGIQPAPIGTPGIQ PA PLGTQG I HSATP I NTQGLQ PA PMGTQQ PQ PEG 616 

P+P +T P P TP P T + + TP I+T P P T P P 
Sbjct: 1422 PSPPTTTTTTPPPTTTPS-PPITTTTTPLPTTTPSPPISTT-TTPPPTTTPSPPTTTPSP 1479 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ T P + P +P TT + T S +T P SP + P 
Sbjct: 1480 PTTTPSPPTTTTTTPPPTTTP SPPMTTPI-TPPASTTTLPPTTTPSPPTTTTTTPPP 1535 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQ 736 

T p + TP+T T + P+ P T PPPT + PS 

Sbjct: 1536 TTTPSPPT TTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTTPSP 1588 

Query: 737 PAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRP 796 

p + +T P +PP TT PPP+ T ++ + PP + P P 

Sbjct: 1589 PTITTTTPPPTTTPSPPTTT-TTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSP — PP 1645 

Query: 797 VSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

+ P T T SP + T+ PP +P 

Sbjct: 1646 TTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTP 1681 

Score - 183 (27.5 bits), Expect » 3.4e-09, P =» 3.4e-09 
Identities - 91/390 (23%), Positives - 139/390 (35%) 

Query: 326 PSAAISIQRPAQSRDVTTR-ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPIL 384 

PS + P+ T TPSPT T I+T TP+ T +P + 

Sbjct: 1399 PSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSPPI 1458 

Query: 385 ATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIP— VAKVVPQQITHTS 442 

+T T P TT S T P+ T + P+ ++ TT+ P + P T T 

Sbjct: 1459 STTTTPPPTTTPSPP-TTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTTL 1517 

Query: 443 PRIQPDYPAERSSLIPISGHRASP NPVAMETRSDNRP — SVPVQFQYFLPTYPPSAY 4 97 

p p++P SPP+T+P+P T PP+ 

Sbjct: 1518 PPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTTPPPTTT 1577 

Query: 498 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

p T TP +++T P + +P T T +T P +T S 

Sbjct: 1578 PSPPTTTTPSPPTITTTTPPPTTTPSPP TTTTTT P P PTTT PS PPTTT P I T P PTSTTT 1634 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P TPPTPPPT TT P P 

Sbjct: 1635 LPPTTTPSPPPTTTTTPPPTTTPS — P-PTTTTPSPPITTTTTPPPTTTPSSPITTTPSP 1691 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ + T ++PI+ + P++TT + +T +P SP + + P 

Sbjct: 1692 PTTTMTTPSPTTTPSSPITT — TTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPP 715 

T + P + + P +++ T S + PT P 
Sbjct: 1750 TTTSSPLT TTPLPPSITPPTFSPFSTTTPTTPCVP 1784 

Score - 176 (26.4 bits), Expect = 1.8e-07, P ■= 1.8e-07 
Identities = 101/402 (25%) , Positives = 142/402 (35%) 

Query: 34 5 ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAP 404 

IT PS P TP T +T +P TP T PTT+ TP 

Sbjct: 1396 ITTPSPPTT-TPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTP 1454 

Query: 405 TSTIVTMTVPSHSSHATAVTTS-NIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHR 463 

+ I T T P ++ + TT+ + P P T T+P P PI+ 

Sbjct: 1455 SPPISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTP--PPTTTPSPPMTTPITPP- 1511 

Query: 464 ASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQA 523 

AS + T PS P T PP+ P + T TPIT ST P + + 

Sbjct: 1512 ASTTTLPPTTT PSPPTTTT TTPPPTTTP-SPPTTTPITPPTSTTTLPPTTTPS 1563 

Query: 524 PNSAITAQ TGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTP 579 

p t T +T +P + T P+P +T P P TP 

Sbjct: 1564 PPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTT TPSPPTTTTTTPPPTTTP 1618 

Query: 580 G IQPAPLGTQGIHSAT PINTQGLQPAPMGTQQPQPEGKTSAVVLADGATIV 630 

IPPT + T PT PPTP S + 

Sbjct: 1619 SPPTTTPITP-PTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPP 1677 



774 



WO 01/12659 



PCT/IB00/01496 



Query: 631 ANPISNPFSAAPAA-TTVVQTHSQSASTNAP-AQGSSPRPSILRKKPATDGAKPKSEIHV 688 

S+P + P+ TT + T S + + ++P ++P + P T P 
Sbjct: 1678 TTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP T 1734 

Query: 689 SMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPG 746 

+ +P T +M T+ P P PPT + + P+ P V L G 

Sbjct: 1735 TTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITPPTFSPF--STTTPTTPCVPLCNWTG 1790 

Score = 168 (25.2 bits), Expect = 9.3e-08, P = 9.3e-08 
Identities =■ 89/387 (22%), Positives - 133/387 (34%) 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPI 507 

DY + P+ +P+P T + + PP PTPSP TP 

Sbjct: 1381 DYKIRVNCCWPMDKCITTPSP PTTTPSPP — PTTTTTLPPTTTPSP-PTTTTTTPPP 1434 

Query: 508 TSSVS— TIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPA 564 

T++ S T P+ P+ 1+ T +T P T + P+ 
Sbjct: 1435 TTTPSPPITTTTTPLPTTTPSPPISTTTTPPPTTT PSPPTTTPSPPTT TPS 1485 

Query: 565 PISTQGIQPAPIGTPGI-QPAPLGTQGIHSATPINTQGLQPAPMGTQQPQ PEGKTSA 620 

P +T P P TP P+ + P T P TP P T+ 

Sbjct: 1486 PPTTTTTTPPPTTTPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTT 1545 

Query: 621 VVLADGATIVANPISNPFSAAPAATTVVQTHSQSA-STNAPAQGS SPRPSILRKKP 675 

+ +T P + P TT T + S +T P+ + +P P+ P 

Sbjct: 1546 PITPPTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPP 1605 

Query: 676 ATDGAKPKSEIHVS--MATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASP 733 

TP S TP+T T + P+ P T PPPT + 

Sbjct: 1606 TTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTT 1664 

Query: 734 PSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGP PVPEIKVKEEVE 789 

PS P +T P + PITT + P ++T ++ P P 

Sbjct: 1665 PSPPITTTTTPPPTTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPP 1724 

Query: 790 PMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

p + P P T+L ++T+ LPP +P 

Sbjct: 1725 PTTMTTPSPTTTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITP 1767 

Score - 154 {23.1 bits), Expect - 2.7e-06, P - 2.7e-06 
Identities - 70/277 (25%), Positives = 92/277 (33%) 

Query: 565 PISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAWLA 624 

PIST PPTPPPT + TP PTPPT + 

Sbjct: 1457 PISTT-TTPPPTTTPS — P-PTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTP — ITP 1510 

Query: 625 DGATIVANPISNPFSAAPAATTVVQTHSQSASTNAP AQGSSPRPSILRKKPATDGA 680 

+T P+P TT T + S T P ++ P+ P T 

Sbjct: 1511 PASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTT 1570 

Query: 681 KPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQ — PPPTIPTMIAAASPPSQPA 738 

P S T T S T++ T PPT PPPT T + P P 

Sbjct: 1571 TPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTT-TPSPPTTTPITPP 1629 

Query: 739 VALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVS 798 

+ +T+P +PP TT PPP+ T ++ PP+ + 

Sbjct: 1630 TSTTTLPPTTTPSPPPTT-TTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTT 1688 

Query: 799 AVPPLATNTV S PSLALLANNL- -SMPTSDLPPGAS PRKKP 836 

PP T T +PS + S T PP P 

Sbjct: 1689 PSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP 1733 

Score - 148 (22.2 bits), Expect = l.le-05, P = l.le-05 
Identities = 62/254 (24%), Positives = 89/254 (35%) 

Query: 583 PAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAV VLADGATIVANPISNP 637 

P+P T SPTLP TPPT+ +T P+ 

Sbjct: 1399 PSPPTTTP — SPPPTTTTTLPP TTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTT 1452 

Query: 638 FSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPKSEIHVS--MATPVT 695 

+ P +TT T+++P SPP+ PT P SM TP+T 

Sbjct: 1453 TPSPPISTTT--TPPPTTTPSPPTTTPSP-PTTTPSPPTTTTTTPPPTTTPSPPMTTPIT 1509 

Query: 696 VSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPIT 755 

T + P+ T PP T P+ + P P + +T+P +PP T 

Sbjct: 1510 PPASTTTLPPTTTPSPPTTTTTTPPPTTTPS— PPTTTPITPPTSTTTLPPTTTPSPPPT 1567 

Query: 756 TIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALL 815 

T PPP+ T ++ PP + PP T P+ + 

Sbjct: 1568 T-TTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPI 1626 
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Query: 816 ANNLSMPTSDLPPGASPRKKP 836 

S T+ LPP +P P 
SbjCt: 1627 TPPTS — TTTLPPTTTPSPPP 1645 

Score - 131 (19.7 bits), Expect = 1.2e-03, P = 1.2e-03 
Identities - 112/492 (22%), Positives - 174/492 {35%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

SbjCt: 3977 VTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 4036 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

SbjCt: 4037 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 4095 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

SbjCt: 4096 PT PTGTQT PTTT PI TTTTT VTPT PTPTGTQT PTTT PITTTTTVT PT PT PTGTQT PTTTP I 4155 

Query: 269 IHQPIQSRPPVTTSNAIPPA — VVATVSATRAQSPVITTTA — AHATDSALSRPTLSIQH 324 

+ PT P +T + T +PTT H + + ++TS 
SbjCt: 4156 TTTTT VTPT PTPTGTQT PTTTP I TTTTTVTPTPTPTGTQTGPPTHTSTA PI AELTTSNPP 4215 

Query: 325 PPSAAISIQRPAQS--RDVTTRI-TLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVA 381 

p S+ R S + TT + TLP PA+ + T T + T T++ 
SbjCt: 4216 PESSTPQTSRSTSSPLTESTTLLSTLP — PAI EMTSTAPPSTPTAPTTTSGGHTLS 4269 

Query: 382 PILATNTIPSAT-TAGSVS-HTQAPTSTIVTMTVPSHSSHATAVTTSNIPVAKVVPQQIT 439 

P +TTP TTG++ + APT + V T S A T + P4-+ P I 

SbjCt: 4270 PPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS AWTPTPTPLS— TPSIIR 4321 

Query: 440 HTSPRIQPDYPAERSSLIPISGHRASPNP-VAMETRSDN RPSVPVQFQYFLPTYP- 493 

T ++P YP+ ++ +P V T D S+ +++ + P 

SbjCt: 4322 TTG — LRP-YPSSVLICCVLNDTYYAPGEEVYNGTYGDTCYFVNCSLSCTLEFYNWSCPS 4 378 

Query: 494 -PSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDA 552 

PSP+ + TPSS+ P P T L +T 

SbjCt: 4379 TPSPTPTPSKS-TPTPSKPSSTPSKPTPGTKPPECPDFDPPRQENETWWLCDCFMATCKY 4437 

Query: 553 SHARHIQGIQ PAPISTQGIQPAPIGTP 579 

++ I ++ P P + G+QP + P 
SbjCt: 4438 NNTVEI VKVECEPPPMPTCSNGLQPVRVEDP 44 68 

Score - 117 (17.6 bits), Expect - 1.8e-02, P = 1.8e-02 
Identities = 41/156 (26%), Positives - 55/156 (35%) 

Query: 710 TIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGG 769 

T + P T PPPT T + + PS P +T P +PPITT P P+ T 

SbjCt: 1398 TPSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITT-TTTPLPTTTPSP 1456 

Query: 770 SLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPG 829 

+S+ PP P P + P T T SP T+ PP 

SbjCt: 1457 PISTTTTPP PTTTPSPPTTTPSPPTTTPSPPTTTTTTP-PPTTTPSPPM 1504 

Query: 830 ASPRKKPRKQQHVISTEEGDMMETNSTDDEKSTAKS 865 

+P P + T T +T +T S 

SbjCt: 1505 TTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPS 1540 

Score - 61 (9.2 bits), Expect - 1.6e-09, P - 1.6e-09 
Identities = 23/93 (24%), Positives = 41/93 (44%) 

Query: 397 SVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIPVAKVV PQQITHTSPRIQPDYPAE 452 

S++ + +T T+T+P+ + T TT+ P •»- V P+ SI D+P+ 

Sbjct: 1257 SITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKLCCLWSDWINEDHPSS 1316 

Query: 453 RSS LIPISGHRASPNPVAMETRSDNRPSVPVQ 484 

S P G +P + E RS P + ++ 

Sbjct: 1317 GSDDGDREPFDGVCGAPEDI--ECRSVKDPHLSLE 1349 

Score - 50 (7.5 bits), Expect - 8,0e-09, P » 8.0e-09 
Identities = 16/41 (39%), Positives - 19/41 (46%) 

Query: 334 RPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTP 374 

RP+ TT ITLP+ P T T T+ ST TP 

Sbjct: 1261 RPSTLTTFTT-ITLPTTPTSFTTTTTTTTPTSSTVLST-TP 1299 

Score = 46 (6.9 bits), Expect « 5.4e-08, P - 5.4e-08 
Identities - 24/106 (22%), Positives = 37/106 (34%) 

Query: 324 HPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPI 383 
+ PP A++ ++ST+PGQA G I 



776 



WO 01/12659 



PCT/IB00/01496 



Sbjct: 1196 YPPGASVPTEETCKSCVCTNSSQVVCRPEEGKILNQTQDGAFCYWEICGPNGTVEKHFNI 1255 

Query: 384 LATNTIPSA-TTAGSVSHTQAPTSTI VTMTVPSHSSHATAVTTSNI 428 

+ T PS TT +++ PTS T T + +S TT + 

SbjCt: 1256 CSITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKL 1301 

Score - 44 (6.6 bits), Expect - 8.7e-08, P ° 8.7e-08 
Identities = 14/34 (41%), Positives = 17/34 (50%) 

Query: 478 RPSVPVQFQYF-LPTYPPSAYPLAAHTYTPITSSV 511 

RPS F LPT P S + T TP +S+V 

SbjCt: 1261 RPSTLTTFTTITLPTTPTS-FTTTTTTTTPTSSTV 1294 



Pedant information for DKFZphtes3_2all, frame 2 
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[ LENGTH ) 1048 

[MW] 110324.04 

[pi] 9.83 

[HOMOL) PIR: 147141 gastric mucin (clone PGM-2A) - pig (fragment) 6e-15 

[FUNCAT) 30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-09 

[ FUNCAT ) 30.01 organization of cell wall (S. cerevisiae, YIR019c] le-09 

(FUNCAT) 01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c) le-09 

[ FUNCAT) 30.02 organization of plasma membrane [S. cerevisiae, YDR420wJ 4e-09 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR420wJ 

4e-09 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJRISlcJ 4e-06 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGR014w] 

le-05 

[ FUNCAT ] 11.01 stress response [S. cerevisiae, YHL028w] le-04 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YHL028w] le-04 

[EC] 3.2.1.3 Glucan 1 , 4-alpha-glucosidase 3e-08 

[PIRKW) glycosidase 3e-08 

[PIRKW) transmembrane protein 3e-08 

[PIRKW] polysaccharide degradation 3e-08 

[PIRKW] glycoprotein 9e-08 

[PIRKW] calcium binding 9e-08 

[PIRKW] hydrolase 3e-08 

[PIRKW] cytoskeleton 7e-08 

[SUPFAM] equine herpesvirus glycoprotein X 2e-07 

[SUPFAM) yeast glucan 1 ( 4-alpha-glucosidase homolog 3e-08 

[SUPFAM] polymorphic epithelial mucin 7e-08 

[SUPFAM] glucan 1, 4-alpha-glucosidase homology 3e-08 

[SUPFAM] equine herpesvirus 1 glycoprotein homology 2e-07 

[PROSITEJ MYRISTYL 9 

[PROSITE] AMI DAT I ON 1 

[PROSITEJ CAMP_PHOSPHO_SITE 2 

[PROSITE) CK2_PHOSPHO_SITE 10 

[PROSITE) PKC_PHOSPHO_SITE 12 

[PROSITE] AS N_GL YCOS Y LAT I ON 3 

[KW] Irregular 

[KW] LOW_COMPLEXITY 20.04 % 

SEQ MGPPRHPQAGEIEAGGAGGGRRLQVEMSSQQFPRLGAPSTGLSQAPSQIANSGSAGLINP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeeccccccccccccccccccccccccccccccccc 

SEQ AATVNDESGRDSEVSAREHMSSSSSLQSREEKQEPVWRPYPQVQMLSTHHAVASATPVA 
SEG xxxxx xxxxxxxxxxxx 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ VTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESSI 

SEG xxxxxxxxxxxxx xxxxxxxxxx . . xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccceeeccccc 

SEQ PQASAI PVATI SGQQGHPSNLHHIMTTNVQMSI IRSNAPGPPLHIGASHLPRGAAAAAVM 

SEG xxxxx.. 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQSRPPVTTSNAIPPAVVATVSATRAQS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQL 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ HTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHA 

SEG xxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TAVTTSNI PVAKVVPQQI THTS PRIQPDY PAERSSLI PI SGHRAS PNPVAMETRSDNRPS 

SEG xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeecccccccc 

SEQ VPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQ 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GLQPAPMGTQQPQPEGKTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AQGSSPRPSILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccceeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ EIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQ 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HVISTEEGDMMETNSTDDEKSTAKSLLVKAEKRKSPPKEYIDEEGVRYVPVRPRPPITLL 

SEG xxxxxxxxxxx. . . . 

PRD ccccccccccccccccccccchhhhhhhhhccccccccccccccccccccccccccccee 

SEQ RHYRNPWKAAYHHFQRYSDVRVKEEKKAMLQEIANQKGVSCRAQGWKVHLCAAQLLQLTN 

SEG 

PRD eeccccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeecccceeehhhhhhhhhhc 

SEQ LEHDVYERLTNLQEGIIPKKKAATDDDLHRINELIQGNMQRCKLVMDQISEARDSMLKVL 

SEG 

PRD cchhhhhhhhhhhceeeeccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DHKDRVLKLLNKNGTVKKVSKLKRKEKV 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhccccceeeeeeeeccccc 
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PS00001 
PS0O001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS0O006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 



818->822 

854- >858 
1033->1037 

872->876 
1037->1041 
68->71 
75->78 
242->245 
342->345 
355->358 
442->445 
513->516 
665->668 
831->834 
862->865 
940->943 
1035->1038 
63->67 
68->72 
75->79 
88->92 
135->139 
473->477 
844->848 

855- >859 
959->963 
984->988 

15->21 



ASN_GLYCOSYLATION 

ASN_GL YCOS Y L AT I ON 

ASN_G L YC OS Y LAT I ON 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PH0SPH02SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC000O8 
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PS00008 


16->22 


MYRISTYL 


PDOC00008 


PS00008 


36->42 


MYRISTYL 


PDOC00008 


PSOO0O8 


233->239 


MYRISTYL 


PDOC00008 


PS00008 


372->378 


MYRISTYL 


PDOC00008 


PS00008 


533->539 


MYRISTYL 


PDOC00008 


PS00008 


535->541 


MYRISTYL 


PDOC00008 


PS00008 


590->596 


MYRISTYL 


PDOC00008 


PS00008 


768->774 


MYRISTYL 


PDOC00008 


PS00009 


19->23 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphtes3_2all . 2 ) 



779 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_2al7 



group: metabolism 

DKFZphtes3_2al7 encodes a novel 574 amino acid protein without similarity to known proteins. 

The novel protein contains a thiol protease cys pattern. Eukaryotic thiol proteases <EC 
3.4.22.-) are a family of proteolytic enzymes containing an active site cysteine. Cathepsins 
belong to this protease family. 

The new protein can find application in modulation of proteolytic processes and as a new 
enzyme for proteomic analysis and biotechnologic production processes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2312 bp 

Poly A stretch at pos . 2300, polyadenylation signal at pos. 2273 



1 GTTTTCACCT GATCATTAGA AACTAATGAA ACACCTTTTA AGTCTTATGA 
51 ATTCAGGTTA CACTGTTTTC CAGATGCCTT GGCAGCTGGT ACAGGGCCTC 
101 TGAAAAATGG AACCAAATTC TCTGAGGACT AAAGTCCCAG CTTTCTTATC 
151 TGATTTGGGG AAGGCCACAT TGAGGGGAAT CAGAAAGTGT CCCCGATGTG 
201 GCACATACAA TGGAACCCGG GGACTGAGCT GTAAGAACAA GACATGTGGA 
251 ACCATATTCC GCTACGGTGC ACGCAAGCAG CCTAGTGTTG AAGCTGTCAA 
301 AATCATTACA GGCTCTGATC TTCAGGTCTA CTCAGTGCGG CAAAGAGACC 
351 GGGGCCCTGA TTACCGATGC TTTGTGGAGC TCGGGGTTTC AGAGACAACA 
401 ATCCAGACAG TGGATGGGAC GATCATCACT CAGCTGAGCT CTGGACGGTG 
451 TTATGTCCCC TCATGCCTGA AAGCTGCCAC TCAAGGCGTT GTGGAAAACC 
501 AGTGCCAGCA CATCAAGCTG GCGGTGAACT GCCAGGCAGA GGCCACCCCT 
551 CTGACCCTGA AGAGCTCGGT CCTGAATGCA ATGCAGGCCT CCCCGGAAAC 
601 CAAACAGACC ATCTGGCAGT TGGCCACGGA ACCCACAGGT CCTCTGGTGC 
651 AGAGAATTAC TAAAAACATC TTGGTGGTGA AATGCAAGGC AAGCCAGAAG 
701 CACAGTTTGG GGTATTTGCA TACATCTTTT GTGCAGAAAG TCAGTGGCAA 
751 AAGCTTGCCT GAGCGCCGCT TCTTCTGCTC CTGTCAGACT CTGAAATCGC 
801 ACAAGTCAAA TGCCTCCAAG GATGAGACAG CCCAGAGATG CATTCATTTC 
851 TTTGCTTGCA TCTGTGCCTT TGCCAGTGAT GAGACACTGG CTCAGGAATT 
901 CTCAGACTTC CTAAATTTTG ATTCCAGCGG TCTTAAAGAG ATTATTGTAC 
951 CCCAGTTAGG TTGCCATTCA GAATCAACAG TATCTGCTTG TGAGTCTACT 
1001 GCCTCTAAGT CAAAGAAGAG GAGAAAGGAT GAAGTATCTG GTGCACAGAT 
1051 GAACAGTTCA CTACTGCCTC AAGATGCAGT GAGCAGTAAT CTAAGGAAAA 
1101 GTGGCCTGAA AAAGCCTGTG GTTGCTTCCT CGTTAAAAAG GCAGGCCTGT 
1151 GGTCAGCTGT TAGATGAGGC ACAAGTGACT TTATCCTTCC AAGACTGGCT 
1201 GGCCAGTGTC ACAGAACGCA TCCATCAAAC CATGCACTAT CAGTTTGATG 
1251 GCAAACCAGA ACCATTGGTG TTCCACATTC CTCAGTCATT TTTTGATGCC 
1301 CTGCAACAAA GAATATCTAT AGGAAGTGCA AAAAAACGGC TCCCCAACTC 
1351 CACCACAGCT TTTGTTCGGA AAGATGCCTT GCCACTGGGA ACCTTTTCCA 
1401 AGTATACTTG GCATATCACT AATATCCTGC AAGTTAAACA AATCTTAGAT 
1451 ACCCCAGAGA TGCCCTTGGA AATCACCCGT AGCTTTATCC AGAACCGAGA 
1501 TGGGACTTAT GAGCTATTTA AATGCCCTAA AGTGGAAGTA GAAAGCATAG 
1551 CAGAAACCTA CGGTCGTATA GAAAAACAAC CAGTGCTGCG ACCCTTGGAA 
1601 CTAAAAACTT TTCTCAAAGT TGGCAACACT TCCCCAGATC AAAAGGAGCC 
1651 AACACCTTTC ATCATCGAGT GGATCCCAGA TATCCTTCCC CAATCTAAGA 
1701 TTGGCGAGCT GCGGATCAAG TTTGAGTATG GCCACCACCG GAATGGGCAT 
1751 GTGGCGGAGT ACCAAGACCA GCGGCCCCCC TTGGACCAGC CCTTGGAACT 
1B01 GGCCCCTCTG ACCACTATTA CTTTCCCTTA AAGCAAAACA AGATAATAAT 
1851 CTTTTGCTGC TTAATTTGCA CATCCCCACC CCTTGACAAC TTTAAATGCT 
1901 AGTTAGGCAC TTAGATGGCC CTGTTCCTTG GTAAACTGCT CTTAGCTAAG 
1951 ATGCAAATTC TCAGTGCTTT CAAGTGGATT CTGTTGAAGA AAATCTCTTG 
2001 TAAATAGCCT TTTTGATGCT GCTGTGTACA GTCTTCATTA TGCATTGGGC 
2051 AGTATTTCTG GCTAGAGTTT TAAAAGGAAC AGAAAGAAAA CCAGCTTATT 
2101 TTCCTTCTTA CGGACTCATC TTTAGCGTTT ATTTCAACCT TTTGCTAATT 
2151 CTCTGAGAAA TCTGCAGCAC TCAGCCATAC ACCAACAGTG TTGGAAAGTT 
2201 AACACCCTGG TTAGGGCAGA ATGTTAAAGA CCATCTTGGC AGAGTTCCAG 
2251 CCACGCTCTT TATTCTGTTC TCAAATAAAG CAGTGTCACT AGTTTTTCCT 
2301 AAAAAAAAAA AA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 1828 bp; peptide length: 574 
Category: putative protein 



1 MEPNSLRTKV PAFLSDLGKA TLRGIRKCPR CGTYNGTRGL SCKNKTCGTI 
51 FRYGARKQPS VEAVKIITGS DLQVYSVRQR DRGPDYRCFV ELGVSETTIQ 
101 TVDGTIITQL SSGRCYVPSC LKAATQGVVE NQCQHIKLAV NCQAEATPLT 
151 LKSSVLNAMQ ASPETKQTIW QLATEPTGPL VQRITKNILV VKCKASQKHS 
201 LGYLHTSFVQ KVSGKSLPER RFFCSCQTLK SHKSNASKDE TAQRCIHFFA 
251 CICAFASDET LAQEFSDFLN FDSSGLKEII VPQLGCHSES TVSACESTAS 
301 KSKKRRKDEV SGAQMNSSLL PQDAVSSNLR KSGLKKPWA SSLKRQACGQ 
351 LLDEAQVTLS FQDWLASVTE RIHQTMHYQF DGKPEPLVFH IPQSFFDALQ 
401 QRISIGSAKK RLPNSTTAFV RKDALPLGTF SKYTWHITNI LQVKQILDTP 
451 EMPLEITRSF IQNRDGTYEL FKCPKVEVES IAETYGRIEK QPVLRPLELK 
501 TFLKVGNTSP DQKEPTPFII EWIPDILPQS KIGELRIKFE YGHHRNGHVA 
551 EYQDQRPPLD QPLELAPLTT ITFP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2al7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2al7, frame 2 



Report for DKFZphtes3_2al7 . 2 



I LENGTH ] 574 

(MW) 64076.89 

Iplj 9.15 

[PROSITEJ MYRISTYL 5 

(PROSITE] CK2_PHOSPHO SITE 9 

I PROSITEJ PKC_PH0SPH02SITE 14 

| PROSITEJ ASN_GLYCOSYLATION 5 

[PROSITEJ THIOL_PROTEASE_CYS 1 

[KW] Alpha_Beta 



SEQ MEPNSLRTKV PA FLSDLGKATLRG I RKCPRCGTYNGTRGLSCKNKTCGT I FRYGARKQPS 

PRD ccccccccccchhhhhcccchhhhhcccccccccccccccccccccccceeeeccccccc 

SEQ VEAVKIITGSDLQVYSVRQRDRGPDYRCFVELGVSETTIQTVDGTIITQLSSGRCYVPSC 

PRD ceeeeeeecccceeeeeccccccccceeeeeecccccceeeccceeeeeecccccccchh 

SEQ LKAATQGVVENQCQHIKLAVNCQAEATPLTLKSSVLNAMQASPETKQTIWQLATEPTGPL 

PRD hhhhhhhhcchhhhheeehhhhhhhcccccchhhhhhhhhcccchhhhhhhhhcccccch 

SEQ VQRITKNILVVKCKASQKHSLGYLHTSFVQKVSGKSLPERRFFCSCQTLKSHKSNASKDE 

PRD hhhhhhheeeeeecccccccccccceeeeeeecccccccceeeecccccccccccccccc 

SEQ TAQRCIHFFACICAFASDETLAQEFSDFLNFDSSGLKEIIVPQLGCHSESTVSACESTAS 

PRD hhhhhhhhhhhhhhhhhchhhhhhhhhhhccccccceeeeeecccccccceeeccccccc 

SEQ KSKKRRKDEVSGAQMNSSLLPQDAVSSNLRKSGLKKPWASSLKRQACGQLLDEAQVTLS 

PRD ccchhhhhccccccccccccccccchhhhhhhccccceeehhhhhhhhhchhhhhhhhhh 

SEQ FQDWLASVTERIHQTMHYQFDGKPEPLVFHIPQSFFDALQQRISIGSAKKRLPNSTTAFV 

PRD hhhhhhhhhhhhhhhhhhhcccccccceeehhhhhhhhhhhhhhhhcccccccccceeee 

SEQ RKDALPLGTFSKYTWHITNILQVKQILDTPEMPLEITRSFIQNRDGTYELFKCPKVEVES 

PRD ecccccccccceeeeehhhhhhhhhhhccccccccceeeeeeccccceeeecccceeeeh 

SEQ IAETYGRIEKQPVLRPLELKTFLKVGNTSPDQKEPTPFIIEWIPDILPQSKIGELRIKFE 

PRD hhhhhhhhhccccccccccceeeeecccccccccccceeeeecccccccccccceeeeee 
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SEQ YGHHRNGHVAEYQDQRPPLDQPLELAPLTTITFP 
PRD ecccccceeeeccccccccccccccccceeeccc 



Prosite for DKFZphtes3_2al7 . 2 



PS00001 


35 


->39 


ASN_GLYCOS YLAT I ON 


PDOC00001 


PS00001 


44 


->48 


ASN_GLYCOSYLATION 


PDOC00001 


PS00001 


235- 


>239 


ASN_GLYCOSYLATION 


PDOC00001 


PS00001 


316- 


>320 


A S N_G LYCOS Y LAT ION 


PDOC00001 


PS00001 


414- 


>418 


AS N_G LYCOS YLAT ION 


PDOC00001 


PS00005 




5->8 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


21 


->24 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


41 


->4 4 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


76 


->79 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


112- 


>115 


PKC*~PHOSPHO SITE 


PDOC00005 


PS00005 


150- 


>153 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


196- 


>199 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


213- 


>216 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


228- 


>231 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


231- 


>234 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


302- 


>305 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


342- 


>345 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


369- 


>372 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


407- 


>410 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


68 


->72 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


216- 


>220 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


237- 


>241 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


293- 


>297 


CK2 PHOSPHO_SITE 


POOC00006 


PS00006 


360- 


>364 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


367- 


>371 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


>398 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


480- 


>484 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


508- 


>512 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


32 


->38 


MYRISTYL 


PDOC00008 


PS00008 


93 


->99 


MYRISTYL 


PDOC00008 


PS00008 


104- 


>110 


MYRISTYL 


PDOC00008 


PS00008 


127- 


>133 


MYRISTYL 


PDOC00008 


PS00008 


312- 


>318 


MYRISTYL 


PDOC00008 


PS00139 


109- 


>121 


THIOL PROTEASE CYS 


PDOC00126 



(No Pfam data available for DKFZphtes3_2al7 . 2) 
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DKFZphtes3_2dl5 



group: testes derived 

DKFZphtes3_2dl5 encodes a novel 274 amino acid protein with similarity to 
C.elegans cosmid F25H2.1. 

The novel protein contains a Pfam predicted C2-domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to C.elegans F25H2.1 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3615 bp 

Poly A stretch at pos. 3603, polyadenylation signal at pos. 3578 



1 GCGGCGGCCT CGAGGTGACA ACTGTCTCCG TCGCAGGCTC CGGCGGGGGC 
51 GCAGGAGGTC GCCCGGCGCG TCACTGTCGG GTCGGCGAGC CACGGGGGCC 
101 GCCGCAGCAC CATGGCGACC ACCGTCAGCA CTCAGCGCGG GCCGGTGTAC 
151 ATCGGTGAGC TCCCGCAGGA CTTCCTCCGC ATCACGCCCA CACAGCAGCA 
201 GCGGCAGGTC CAGCTGGACG CCCAGGCGGC CCAGCAGCTG CAGTACGGAG 
251 GCGCAGTGGG CACCGTGGGC CGACTGAACA TCACGGTGGT ACAGGCAAAG 
301 TTGGCCAAGA ATTACGGCAT GACCCGCATG GACCCCTACT GCCGACTGCG 
351 CCTGGGCTAC GCGGTGTACG AGACGCCCAC GGCACACAAT GGCGCCAAGA 
401 ATCCCCGCTG GAATAAGGTC ATCCACTGCA CGGTGCCCCC AGGCGTGGAC 
451 TCTTTCTATC TCGAGATCTT CGATGAGAGA GCCTTCTCCA TGGACGACCG 
501 CATTGCCTGG ACCCACATCA CCATCCCGGA GTCCCTGAGG CAGGGCAAGG 
551 TGGAGGACAA GTGGTACAGC CTGAGCGGGA GGCAGGGGGA CGACAAGGAG 
601 GGCATGATCA ACCTCGTCAT GTCCTACGCG CTGCTTCCAG CTGCCATGGT 
651 GATGCCACCC CAGCCCGTGG TCCTGATGCC AACAGTGTAC CAGCAGGGCG 
701 TTGGCTATGT GCCCATCACA GGGATGCCCG CTGTCTGTAG CCCCGGCATG 
751 GTGCCCGTGG CCCTGCCCCC GGCCGCCGTG AACGCCCAGC CCCGCTGTAG 
801 CGAGGAGGAC CTGAAAGCCA TCCAGGACAT GTTCCCCAAC ATGGACCAGG 
851 AGGTGATCCG CTCCGTGCTG GAAGCCCAGC GAGGGAACAA GGATGCCGCC 
901 ATCAACTCCC TGCTGCAGAT GGGGGAGGAG CCATAGAGCC TCTGCCTCGA 
951 TGCCGTTTTG CCCCCGCTCT TTGGACACGC CGACCCGGCG CTCCCCAAGG 
1001 AATGCTGTCC CAACAAGATT CCCGTGAAAG AGCACCCGTG TCGCCCCCTC 
1051 CCGTGGACTT CTGTGCCGCC CCGTCCACAC CTGTTCTTGG GTGCATGTGG 
1101 GTTTTCGGTT CCTGGCGGTC CAGGACGGGG CGGGGGCTCC CCTCCCATCT 
1151 CGTGCTGGGA GGTCTCAGCG CGCTCTCCTG TCCCTGGGAC GTGCGTCTCT 
1201 CCTTCTCATG CCGTTCTGGA AAATGCTCTT GCTGTAGAGA GCAGCTGCTT 
1251 CTGCCAGGGT GTTGGAGGTG GTGGAGCGCC TTCCGATTCC ATTCATGGCA 
1301 TTTTGTGATG TGATGTAATT GGAATAGAGC TGTTGATTTA AGGCACACAC 
1351 AATCCCTCAC ACTGTGGGTT TTTTTTAGAA CTTCCCAGAC GAAAACTCAC 
1401 GCCCTTGCCC TAACGCGCTT TGCTGTCAGC CTGGCCCCTG CCCAGGGCTT 
1451 GGGTCTGGTG AGCTGAGCAG CTTCCTGTGG ATGGTGTGGG GCCGGCCTCT 
1501 GGCCTGGCTC ACCTGGCCAC TGTCCAGCCA GCCTTGTGAC AGACTCCGGC 
1551 CTGAAGGCAG AATGAACCCA CACCTGGAGT GAGGAAGGGG GCCTGGCACG 
1601 GTTGGCCAGG CTCTGCCTGA TTGCCAGCCA GCGGGCATCT GAAGCCGGGT 
1651 CCTTCGCCCG CCGGAGGCTG CCGTCCGTCT CTCCTGCTGC GCTCGTGCCA 
1701 GCTCCGTGGG TGTCCTCCCA GGGAGCTTCT CTTCTCAACA GGCCTTGCGA 
1751 GGCTGGGGTG AGAGGTGATA GAGGCAGCAC TGTGCATGAT TCCGAGAGGG 
1801 TGTGGTGGCA CTGCCAGCCG ACTGCTGACA GCTTGGGAGC TGCTGTGCCC 
1851 AGGACGTGGG TTCAGCGTGG GCGAGGAAAG CCTGGCGAGC GTGGCCCTGT 
1901 AAAAGCTTTC TGAGGCGGGA GGCGCTCACT TACCTCTGAC TGCCTGGGCG 
1951 CTGCGTGTAG CATCTTGGCC TACAGGACAG ATTTTAGGTG ACACCTGGTT 
2001 ATGACAGTCA GAAATTTGAG AAGCTTCTCA CAAGTGATGC ACTTTAAATA 
2051 ATCTGCATGC CATTGAGACA CCTGCATGTC TGGTGTTTGT GGTTCAAGTG 
2101 TCTTGCCGCC GGCCTTCGGA TGTAAACCCA CTGATAACGG ACAGAAAGAG 
2151 AATGCCCACA AGTGGGTCTT CTGTGGAAGA TGCAGAAGGA GGAAGTTAGT 
2201 GCTTACATTT TAGTCTTTTT CTCCCTCAAA AAAATAGGTT AAGTTTCAGT 
2251 GCCAGCTAGA AAATACTGCT TTCTGCCATC GATTGGGGGT GGTTTTTGTC 
2301 AAA TAT AC TG TTGATAAATA TTTATTTTTG TAAACTTGAA GTGTGTGGTG 
2351 GCCGTGGGGG AGGGACATGC TGGCAGCAGG CGCCTTCTTC AGCTGTGGGT 
2401 CCTAAAGGCC TTTGATCCTT TGAAGAAGAA AGACATGGTA TTTGTTCAGC 
2451 AGACGCCGAC CACTCAGACG GAGGGGCCCC TGGGATTCCC TGTCTCAGAT 
2501 GGCCTGGTCT TACGCCTGTG TAGATTTCTT CTCCATTGGG AATGAAGGTG 
2551 TCAGGCGGGA CTGGAACGTT CTAGATGGTA TGTTCCGTGA TATTAACAAC 
2601 TCTAACCCAG GACAGACCAC AAGCCACACT CAGAGGCCTC ACTGTGCTGG 
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2651 GGGCTTCGGT GTCCAGGCGC CCAGGTGTGG CCACCAGCAC CGGTTTCTGC 

2701 CTTCGCGTTG CTGGGGTGCA GTGAGACTGC CACACGCGTG CACATGTGGC 

2751 TCTGTGGGTG TCTCCTAGAG AGGACGTGGC CCCTGCTGCC AGCCCTTGAG 

2801 CAGCCCGTGT GGGGGCCCGA GGGACCCACA CAGTGGGGGC CAGCCTCGCT 

2851 GGAGGGAGAG CAACCCTTTG CCGATGACCA CGCTTGCCGC CATCTCTTAG 

2901 TTTTCTTTTT CACAAGCGCT TTATTTTTTT AATAGACAAA TCACATTTTG 

2951 CAAGGCCTTT AATTAAATAA GATTCTTCTT TCCTTCATTT TATGCTTTAT 

3001 TTCCTGTTTG AAGGCTTACT GTAGAAGTGG CTTACTGTAG AAGCAGCTTG 

3051 CTGAGCCCCT CCGAGCGGTC CCCAGAATTA GCTGGTTCAC AACCCCCACC 

3101 CTCCCCCGCC CCCGCCTGTG TCAGGTGTGG ATGAGGTCGT CACACTCAGA 

3151 AGGACAGGCT TGTCTGCCAG CTCACAAGGG GAGGCTGCAG TGGGTTTGGG 

3201 AGCTGGGTTT AGGCCCCTGG TGTCTGAGGG CCCAGGCCTT GCCAGCCTCT 

3251 GCTGCTCCTG CTCCTGGGTT TGAAGATGCA GGCCGATCGC CAGCTCCGTG 

3301 GCAGCGGTCA CTAAGGACAG CCTGACTGTG CCATCTTGGA GCCTCAGGCG 

3351 GGGCTCCGGA GATAGAAGAC AGGTCGCCGG AGGCTCCCCC TCCTCTCCTC 

3401 TCCCCTCTGC AGATGCTCCC TGGGCGCTAC CCTGCAGGGT GCCAGGCAGG 

3451 AGTGGTCTCA GAACGTGCGC TTCTGATTAT TTTACTGGGG TCCATTGTCC 

3501 AGATTTTTCT TTGATTGTAA AATATATTTT TACTTTTTAG TCTTCTAATT 

3551 TAATAAATGA TCCATATAAA AATAGAGAAA TAAAGTCCTT TAAGGGAAGG 
3601 TTTAAAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 112 bp to 933 bp; peptide length: 274 
Category: similarity to unknown protein 
Classification: no clue 



1 MATTVSTQRG PVYIGELPQD FLRITPTQQQ RQVQLDAQAA 
51 TVGRLNITVV QAKLAKNYGM TRMDPYCRLR LGYAVYETPT 
101 NKVIHCTVPP GVDSFYLEIF DERAFSMDDR IAWTHITIPE 
151 WYSLSGRQGD DKEGMINLVM SYALLPAAMV MPPQPVVLMP 
201 PITGMPAVCS PGMVPVALPP AAVNAQPRCS EEDLKAIQDM 
251 SVLEAQRGNK DAAINSLLQM GEEP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2dlS, frame 1 

TREMBL : CEF25H2_1 gene: "F2SH2.1"; Caenorhabditis elegans cosmid F25H2, 
N - 1, Score = 385, P = l.le-35 



QQLQYGGAVG 
AHNGAKNPRW 
SLRQGKVEDK 
TVYQQGVGYV 
FPNMDQEVIR 



>TREMBL:CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 
Length » 457 

HSPs: 

Score " 385 (57.8 bits), Expect * l.le-35, P = l.le-35 
Identities = 77/182 (42%), Positives = 118/182 (64%) 

Query: 4 TVSTQRG PVYI GELPQDFLRI T - PTQQQRQVQLDAQAAQQLQYGGAVGT VGRLN I TVVQA 62 

TV+ +R V +GELP FLR+ PQQ+++Q+++ T GRL++T+++A 

Sbjct: 5 TVAERRRQVLVGELPPHFLRLAVPIQQTAEPEI-VQP-RMVSFVPP-NTRGRLSVTILEA 61 

Query: 63 KLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEI FDE 122 

L KNYG+ RMDPYCR+R+G ++T AN + P WN+ ++ +P V+S Y++IFDE 
Sbjct: 62 NLVKNYGLVRMDPYCRVRVGNVEFDTNVAANAGRAPTWNRTLNAYLPMNVESIYIQIFDE 121 

Query: 123 rafsmddriawthitipeslrqgkvedkwyslsgrqgddkegminlvmsyal — LPAAMV 180 

+AF D+ IAW HI +P ++ G D+ + + LSG+QG+ KEGMI+L S+A LP 
Sbjct: 122 KAFGPDEVIAWAHIMLPLAIFNGDNIDEYFQLSGQQGEGKEGMIHLHFSFAPIDLPLQQA 181 
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Query: 181 MPPQP 185 
P +P 

Sbjct: 182 APAEP 186 

Score - 92 (13.8 bits), Expect = 1.8e-01, P «■ 1.7e-01 
Identities - 26/68 (38%), Positives - 38/68 (55%) 

Query: 194 QQGVGYVPITGMPAVCSPGMVPV — ALP--PAAVNAQPRCSEEDLKAIQDMFPNMDQEVI 24 9 

QQG G + + +P +P+ A P PA +EED K IQ+MFP +D+EVI 

SbjCt: 156 QQGEGKEGMIHLHFSFAPIDLPLQQAAPAEPAPAPLPVEITEEDTKEIQEMFPIVDKEVI 215 

Query: 250 RSVLEAQR 257 

+ +LE +R 
Sbjct: 216 KCILEERR 223 

Pedant information for DKFZphtes3_2dl5, frame 1 



Report for DKFZphtes3_2dl5 . 1 

[LENGTH] 274 

(MWJ 30281.97 

[pi] 5.68 

[HOMOLJ TREMBL :CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 4e-36 

[PFAM] C2 domain 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 16.42 % 

SEQ MATTVSTQRGPVYIGELPQDFLRITPTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVV 

SEG XXXXXXXXXXXXXXXXX 

PRD cccccccccceeeeeccccceeeecccchhhhhhhhhhhhhhhhhcccccceeeeceeeh 

SEQ QAKLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIF 

SEG 

PRD hhhhhhhhcccccccchhhhheeeeeecccccccccccccceeeeeccccccceeeeeec 

SEQ DERAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYALLPAAMV 

SEG xxxxxxxx 

PRD cccccccccceeeeccccccccccccccceeeeeccccccccccceeeeehhhhhhhhhc 

SEQ MPPQPWLMPTVYQQGVGYVPITGMPAVCSPGMVPVALPPAAVNAQPRCSEEDLKAIQDM 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD ccccceeeeeeeeecccccccccccceeecccccccccccceeeeccccchhhhhhhhhc 

SEQ FPNMDQEVIRSVLEAQRGNKDAAINSLLQMGEEP 

SEG 

PRD ccccchhhhhhhhhhhccccchhhhhhhhhhccc 

(No Prosite data available for DKFZphtes3_2dl5. 1) 

Pfam for DKFZphtes3_2dl5 . 1 

HMM_NAME C2 domain 

HMM *LtVr IleARNLWkMDMnGf SDPYVKVdMdPdpkDtkKWKTJcTiWNNGLN 
L++ ++++A+ + + M+ DPY+++ + + + +T T +N N 

Query 55 LNITVVQAKLAKNYGMT-RMDPYCRLRLGYAVY ETPTAHNGAKN 97 

HMM PVWNEEeFvFedlPyPdlqrkMLRFaVWDWDRFSRBDFIGHCi* 

P+WN + +P + + ++++D+ FS +D 1+ + 
Query 98 PRWN-KVIHCT-VPPGVDSF YLEIFDERAFSMDDRIAWTH 135 
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DKFZphtes3_2el2 



group: Transcription Factors 

DKFZphtes3 2el2 encodes a novel 849 amino acid protein with similarity to Zinc finger 
proteins. 

The new protein is a putative transcription factor with three C2H2 zinc fingers. Additionally, 
a cytochrome C family heme-binding site signature is present in the protein, which is only 
found in cytochrom C related proteins. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



similarity to finger proteins 

complete cDNA, complete cds, 5 EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3205 bp 

Poly A stretch at pos. 3192, polyadenylation signal at pos. 3171 



1 GGCACGGCCG GGTCCTGGCT GGCCAAACGA GGCTCGCGGA AGCAGCAGCC 
51 GCCGCCTGAC CGCAGCTGGA TTTTGAAGAT TGATCCAAGG GACTGTATTA 
101 ATTTCAGGAA TTGATTTGAA AGACACTGGC TCTGCCACTT AACAGCCATG 
151 TAACCTTGGA TATGGAAGAA AGTAGCAGTG TTGCCATGTT GGTGCCAGAT 
201 ATTGGGGAAC AGGAAGCTAT ACTGACTGCT GAAAGTATCA TCAGTCCTTC 
251 ATTGGAAATT GATGAACAAA GAAAAACTAA ACCAGATCCA TTAATCCATG 
301 TTATCCAGAA GTTAAGCAAG ATAGAAAAAT GAAAAGTCAC AAAAATGTCT 
351 TTTAATTGGG AAGAAACGCC CACGTTCAAG TGCTGCAACA CACTCTCTTG 
401 AAACCCAAGA ACTTTGTGAG ATTCCGGCTA AAGTAATCCA GTCACCTGCT 
451 GCTGATACTA GAAGGGCTGA GATGTCACAA ACAAATTTTA CCCCTGACAC 
S01 TCTTGCCCAG AATGAAGGGA AGGCTATGTC TTATCAGTGT AGCCTTTGTA 
551 AGTTTCTATC ATCATCCTTT TCCGTGTTAA AAGATCATAT TAAGCAACAT 
601 GGTCAGCAAA ATGAAGTGAT ACTGATGTGC TCAGAGTGCC ATATTACATC 
651 TAGAAGCCAG GAGGAACTTG AAGCCCACGT GGTGAATGAC CATGACAATG 
701 ATGCCAATAT CCACACCCAA TCCAAAGCCC AACAGTGCGT AAGCCCCTCC 
751 AGCTCTTTGT GTCGGAAAAC CACAGAAAGA AATGAAACCA TTCCAGATAT 
801 CCCAGTAAGT GTGGACAATC TACAGACTCA TACTGTCCAA ACTGCATCTG 
851 TGGCAGAAAT GGGTAGGAGG AAATGGTATG CATACGAACA GTACGGCATG 
901 TATCGATGCT TGTTTTGTAG TTATACTTGT GGCCAGCAGA GAATGTTGAA 
951 AACACACGCT TGGAAACATG CTGGGGAGGT TGATTGCTCC TATCCAATCT 
1001 TTGAAAATGA AAATGAACCC CTAGGCCTGC TGGATTCTTC AGCAGCTGCT 
1051 GCGCCTGGTG GGGTCGATGC AGTCGTCATT GCTATTGGAG AGAGTGAACT 
1101 GAGTATCCAC AATGGGCCAT CAGTGCAAGT GCAGATTTGC AGCTCAGAAC 
1151 AGTTATCATC TTCATCTCCT TTAGAACAGA GTGCAGAAAG AGGAGTACAC 
1201 CTAAGTCAGT CAGTTACCCT GGACCCCAAT GAGGAAGAAA TGCTAGAAGT 
1251 GATTTCTGAT GCAGAGGAGA ATCTGATTCC TGATAGCCTG CTTACATCAG 
1301 CACAGAAAAT CATCAGCAGC AGCCCCAATA AAAAAGGGCA TGTTAACGTG 
1351 ATAGTGGAGC GATTGCCAAG TGCTGAAGAA ACCCTTTCAC AGAAGCGCTT 
1401 CCTCATGAAC ACTGAAATGG AAGAAGGGAA GGACCTGAGC CTGACAGAAG 
14 51 CTCAGATTGG GCGCGAAGGA ATGGATGATG TTTATCGTGC TGATAAATGT 
1501 ACTGTTGATA TTGGGGGATT GATCATAGGC TGGAGCAGTT CAGAGAAAAA 
1551 AGACGAGTTA ATGAATAAAG GCCTGGCTAC TGATGAGAAT GCCCCACCAG 
1601 GCCGGAGAAG GACAAATTCT GAGTCTCTTC GATTACACTC ATTAGCTGCA 
1651 GAAGCCCTTG TCACAATGCC TATAAGAGCT GCAGAGTTGA CAAGAGCCAA 
1701 CCTGGGGCAC TATGGAGATA TAAACCTTTT AGATCCAGAT ACTAGTCAAA 
17 51 GGCAAGTAGA TAGTACATTG GCAGCGTACT CAAAAATGAT GTCGCCACTT 
1801 AAAAACTCTT CAGATGGATT AACTAGTCTT AACCAAAGCA ACTCCACCTT 
1851 GGTAGCACTC CCAGAGGGTA GGCAGGAATT GTCAGATGGG CAGGTTAAGA 
1901 CAGGCATCAG CATGTCCTTA CTCACCGTCA TTGAAAAATT GAGAGAAAGG 
1951 ACAGACCAAA ACGCTTCAGA CGATGACATT TTGAAAGAGT TGCAGGACAA 
2001 CGCCCAGTGC CAACCCAACA GCGATACAAG TTTGTCCGGA AACAATGTGG 
2051 TGGAATACAT CCCGAATGCT GAACGACCCT ACCGTTGCCG CCTGTGTCAC 
2101 TACACAAGTG GCAACAAGGG CTACATCAAG CAGCACTTAC GAGTCCATCG 
2151 ACAGAGACAG CCTTATCAGT GTCCTATCTG CGAGCACATA GCGGACAACA 
2201 GCAAAGATTT GGAGAGTCAC ATGATCCACC ACTGTAAGAC AAGAATATAC 
2251 CAGTGCAAGC AGTGTGAAGA ATCCTTCCAT TATAAGAGTC AATTGAGGAA 
2301 CCATGAGAGA GAACAGCACA GTCTTCCAGA TACCTTGTCA ATAGCAACTT 
2351 CTAATGAGCC AAGAATTTCC AGTGATACAG CTGATGGAAA ATGTGTCCAG 
2401 GAAGGGAATA AGTCTTCAGT CCAGAAACAA TATAGATGTG ATGTGTGTGA 
24 51 TTATACAAGT ACAACATATG TTGGTGTCAG AAACCACAGG CGAATCCATA 
2501 ACTCTGATAA GCCGTACAGA TGCTCTCTGT GTGGGTATGT GTGTAGCCAT 
2551 CCTCCTTCTT TGAAGTCTCA TATGTGGAAA CATGCAAGTG ACCAAAATTA 
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2601 CAACTACGAA CAAGTAAACA 
2651 GCAGAGTTCT GGGGAAATCC 
2701 GAGAGTGCAG ATCCCGTCAC 
27 51 AGAACTGATG TCCCAGACTC 
2801 AGAAACTGAG CCCTACAAGT 
2851 AGTCTGGCCC CTCCTAGCAT 
2901 TATTTGTGGT TTTGAATCAA 
2951 AAGAGCACGA GGGTGAAATT 
3001 ACAGCTCTAA ACACAAATTA 
3051 TAGAAGAGGA TTCCTTCACC 
3101 TCCTGCCACA GAAGAAGTCG 
3151 TGACTTTGGA ACCAAACTTG 
3201 AAAAA 



AGGCTATTAA CGACGCGATT TCACAAAGTG 
CCTGGAAAGA CTCAATTAAA GAGCAGTGAA 
TGGAAGTTCG GAAAATGCAG TGTCATCTTC 
CCAGTGAAGT TCTGGGTACC AACGAGAATG 
AATACCTCAT ATAGTTTAGA AAAAATCTCC 
GGAGTACTGC GTTTTACTCT TCTGCTGTTG 
CCAGCAAAGA AAACCTCTTG GATCATATGA 
GTAAACATCA TCCTGAATAA GGACCACAAT 
GGTGGAATAA TGACTCGAGC AGGAAAGCAG 
ACAGTTTCAC CTTTACGCTG TCAGACAACT 
TTGATGTGAT TTTTGAGGAA ATGACAGATG 
TAATAAAAGG AATTCCAAAT GGAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



90301500: 

Cloning and sequencing of a zinc finger cDNA expressed in mouse testis. 
92310982: 

Zfp-37, a new murine zinc finger encoding gene, is expressed in a 
developmentally regulated 

pattern in the male germ line. 



Peptide information for frame 1 



ORF from 472 bp to 3018 bp; peptide length: 849 
Category: similarity to known protein 



1 MSQTNFTPDT LAQNEGKAMS YQCSLCKFLS SSFSVLKDHI KQHGQQNEVI 

51 LMCSECHITS RSQEELEAHV VNDHDNDANI HTQSKAQQCV SPSSSLCRKT 

101 TERNETIPDI PVSVDNLQTH TVQTASVAEM GRRKWYAYEQ YGMYRCLFCS 

151 YTCGQQRMLK T HAWK HAG EV DCSYPI FENE NEPLGLLDSS AAAAPGGVDA 

201 VVIAIGESEL SIHNGPSVQV QICSSEQLSS SSPLEQSAER GVHLSQSVTL 

251 DPNEEEMLEV ISDAEENLIP DSLLTSAQKI ISSSPNKKGH VNVIVERLPS 

301 AEETLSQKRF LMNTEMEEGK DLSLTEAQIG REGMDDVYRA DKCTVDIGGL 

351 IIGWSSSEKK DELMNKGLAT DENAPPGRRR TNSESLRLHS LAAEALVTMP 

401 IRAAELTRAN LGHYGDINLL DPDTSQRQVD STLAAYSKMM SPLKNSSDGL 

451 TSLNQSNSTL VALPEGRQEL SDGQVKTGIS MSLLTVIEKL RERTDQNASD 

501 DDILKELQDN AQCQPNSDTS LSGNNVVEYI PNAERPYRCR LCHYTSGNKG 

551 YIKQHLRVHR QRQPYQCPIC EHIADNSKDL ESHMIHHCKT RIYQCKQCEE 

601 SFHYKSQLRN HEREQHSLPD TLSIATSNEP RISSDTADGK CVQEGNKSSV 

651 QKQYRCDVCD YTSTTYVGVR NHRRIHNSDK PYRCSLCGYV CSHPPSLKSH 

701 MWKHASDQNY NYEQVNKAIN DAISQSGRVL GKSPGKTQLK SSEESADPVT 

751 GSSENAVSSS ELMSQTPSEV LGTNENEKLS PTSNTSYSLE KISSLAPPSM 

801 EYCVLLFCCC ICGFESTSKE NLLDHMKEHE GEIVNIILNK DHNTALNTN 



BLAST P hits 



Entry S10245 from database PIR: 
finger protein, testis - mouse 

Score - 265, P - 8.4e-23, identities - 61/205, positives = 91/205 

Entry S22954 from database PIR: 
finger protein zfp-37 - mouse 

Score = 265, P «* 9.1e-22, identities - 61/205, positives = 91/205 
Entry AF0316S7_1 from database TREMBL: 

gene: "2fp94"; product: "zinc-finger protein 94"; Rattus norvegicus 

zinc-finger protein 94 (Zfp94) gene, partial cds. 

Score - 243, P - 1.6e-21, identities - 57/190, positives = 85/190 



Alert BLASTP hits for DKFZphtes3_2el2, frame 1 
No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_2el2, frame 1 



Report for DKFZphtes3_2el2 . 1 



( LENGTH ) 


849 


[MW) 


94325. 42 


[pi] 


5. 47 


[HOMOL] 


PIR:A54661 zinc finger protein ZNF41 - human (fragment) 2e-22 


[ FUNCAT) 


04.05.01.04 transcriptional control [S. cerevisiae, YJL056c] 3e-09 


(FUNCAT) 


30.10 nuclear organization (S. cerevisiae, YJL056c] 3e-09 


[FUNCAT] 


04.03.01 trna synthesis [S. cerevisiae, YPR186c PZFl - TFIIIA) 


[FUNCAT] 


04.01.01 rrna synthesis IS. cerevisiae, YPR186c PZFl - TFIIIA] 


[FUNCAT) 


04.99 other transcription activities [S. cerevisiae, YORll3w] 4e-07 


[FUNCAT] 


01.05.04 regulation of carbohydrate utilization [S. cerevisiae, 


2e-04 


13.04 homeostasis of other ions [S. cerevisiae, YNL027w] 2e-04 


[FUNCAT] 


[FUNCAT] 


11.01 stress response (S. cerevisiae, YMR037c] 3e-04 


[BLOCKS] 


BL00028 Zinc finger, C2H2 type, domain proteins 


[SCOP] 


dlmeyg_ 9.6.1.1.1 a designed zinc finger protein [syntheti 8e-06 


[PIRKW] 


nucleus 8e-18 


[PIRKW] 


RNA binding 5e-13 


[PIRKW] 


duplication 7e-13 


[PIRKW] 


tandem repeat le-21 


[PIRKW] 


spermatogenesis 6e-16 


[PIRKW] 


zinc 9e-21 


[ PIRKW] 


zinc finger le-21 


[PIRKW] 


DNA binding le-21 


[PIRKW] 


metal binding 3e-15 


[PIRKWJ 


phosphoprotein 5e-13 


[PIRKW] 


leucine zipper le-13 


[PIRKW] 


alternative splicing 6e-18 


[PIRKW] 


eye lens 2e-16 


[PIRKW] 


oocyte le-12 


[PIRKW] 


transcription factor 6e-18 


[PIRKW) 


segmentation 7e-13 


[PIRKW] 


embryo le-12 


(PIRKWJ 


transcription regulation 2e-19 


(PIRKW] 


homeobox 2e-08 


(SUPFAM] 


POZ domain homology 7e-15 


(SUPFAM] 


transcription factor Krueppel 7e-13 


[SUPFAM] 


zinc finger protein ZFP-36 le-21 


[SUPFAM] 


homeobox homology 2e-08 


[SUPFAM] 


unassigned homeobox proteins 2e-08 


[PROSITE] 


CYTOCHROME C 1 


[PROSITE] 


MYRISTYL 10 


[PROSITE] 


ZINC FINGER C2H2 3 


[PROSITE] 


AMI DAT I ON 2 


[ PROSITE) 


CAMP PHOSPHO SITE 2 


[ PROSITE) 


CK2 PHOSPHO SITE 18 


[PROSITE] 


TYR PHOSPHO SITE 3 


[PROSITE] 


PKC PHOSPHO SITE 10 


[PROSITE] 


ASN_GLYCOSYLATION 7 


[PFAM] 


Zinc finger, C2H2 type 


[KW] 


Irregular 


[KW] 


3D 


[KW] 


LOW_COMPLEXITY 5 . 65 % 



SEQ MSQTNFTPDTLAQNEGKAMSYQCSLCKFLSSSFSVLKDHIKQHGOQNEVILMCSECHITS 

SEG xxxxxxxxxxxxxxx 

lmeyF 

SEQ RSQEELEAHWNDHDNDANIHTQSKAQQCVSPSSSLCRKTTERNETIPDIPVSVDNLQTH 

SEG 

lmeyF 

SEQ TVQTASVAEMGRRKWYAYEQYGMYRCLFCSYTCGQQRMLKTHAWKHAGEVDCS YPI FENE 

SEG 

lmeyF 

SEQ NEPLGLLDSSAAAAPGGVDAVVIAIGESELSIHNGPSVQVQICSSEQLSSSSPLEQSAER 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx. . . 

lmeyF 

SEQ GVHLSQSVTLDPNEEEMLEVISDAEENLIPDSLLTSAQKIISSSPNKKGHVNVIVERLPS 

SEG 

lmeyF 
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SEQ AEETLSQKRFLMNTEMEEGKDLSLTEAQIGREGMDDVYRADKCTVDIGGLI IGWSSSEKK 

SEG 

lmeyF 

SEQ DELMNKGLATDENAPPGRRRTNSESLRLHSLAAEALVTMPIRAAELTRANLGHYGDINLL 

SEG 

lmeyF 

SEQ DPDTSQRQVDSTLAAVSKMMSPLKNSSDGLTSLNQSNSTLVALPEGRQELSDGQVKTGIS 



SEQ MSLLTVIEKLRERTDQNASDDDILKELQDNAQCQPNSDTSLSGNNVVEYIPNAERPYRCR 
SEG 



SEQ LCHYTSGNKGYIKQHLRVHRQRQPYQCPICEHIADNSKDLESHMIHHCKTRIYQCKQCEE 

SEG 

lmeyF TTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCCCEEETTTTE 

SEQ SFHYKSQLRNHEREQHSLPDTLSIATSNEPRISSDTADGKCVQEGNKSSVQKQYRCDVCD 

SEG 

lmeyF EECCHHHHHHHHHHHC 

SEQ YTSTTYVGVRNHRRIHNSDKPYRCSLCGYVCSHPPSLKSHMWKHASDQNYNYEQVNKAIN 

SEG 

lmeyF 

SEQ DAISQSGRVLGKSPGKTQLKSSEESADPVTGSSENAVSSSELMSQTPSEVLGTNENEKLS 

SEG 

lmeyF 

SEQ PTSNTS YSLEKISSLAPPSMEYCVLLFCCCICGFESTSKENLLDHMKEHEGEI VNI ILNK 

SEG 

lmeyF 

SEQ DHNTALNTN 

SEG 

lmeyF 



SEG 
lmeyF 



lmeyF 



TTTEETT 



Prosite for DKFZphtes3_2el2 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00O04 
PS00OO4 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00OO6 
PS00006 
PS00006 
PS 00 00 6 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



106-M10 
126->130 
232->236 
262->266 
300->304 
314->318 
323->327 
355->359 
381->3S5 
485->489 
499->503 
617->621 
626->630 
741->745 
758->762 
766->770 
817->821 



101->104 
306->309 
357->360 
385->388 
425->428 
678->6Sl 
696->699 
726->729 
817->820 



104->108 
445->449 
454->458 
457->461 
497->501 
646->650 
784->788 
98->102 
378->382 



62->66 



59->62 



ASNJ3LYCOSYLATION 

ASN_GL Y COS Y LAT I ON 

ASN_GL YCOS Y LAT I ON 

AS N_GL YCOS Y LAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GL Y COS Y LAT I ON 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO SITE 

PKC_PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC"PHOSPHO~SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_SI TE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO^SITE 

CK2_PHOSPHO SITE 

CK2_PHOSPHO~SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC000O6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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331- 


>339 


TYR PHOSPHO 


SITE 


PDOC00007 


PS0D007 


703- 


>7 1 1 


TYR~PHOSPHO~ 


"site 


PDOC00007 


PS00007 


596- 


>605 


TYR~PHOSPHO^ 


>ITE 


PDOC00007 


PS00008 


142- 


>148 


MYRISTYL 




PDOC00008 


PS00008 


185- 


>191 


MYRISTYL 




PDOC00008 


PS00008 


196- 


>202 


MYRISTYL 




PDOC00008 


PS00008 


241- 


>247 


MYRISTYL 




PDOC00008 


PS00008 


349- 


>355 


MYRISTYL 




PDOC00008 


PS00008 


473- 


>479 


MYRISTYL 




PDOC00008 


PS00008 


478- 


>484 


MYRISTYL 




PDOC00008 


PS00008 


645- 


>651 


MYRISTYL 




PDOC00008 


PS00008 


751- 


>757 


MYRISTYL 




PDOC00008 


PS00008 


772- 


>778 


MYRISTYL 




PDOC00008 


PS00009 


130- 


>134 


AMI DAT ION 




PDOC00009 


PS00009 


376- 


>380 


AMI DAT ION 




PDOC00009 


PS00028 


146- 


>167 


ZINC FINGER 


C2H2 


PDOC00028 


PS00028 


684- 


>705 


ZINC FINGER" 


'C2H2 


PDOC00028 


PS00028 


595- 


>617 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00190 


53 


->59 


CYTOCHROME_C 


PDOC00169 



Pfam for DKFZphtes3_2el2 . 1 



74 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR.T.H* 

C++ C+ T R++++L++H H 
Query 53 CSE--CHITSRSQEELEAHWN-DH 

23.25 {bits) f; 539 t: 559 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMRTH* 
C C++T ++ ++H+R+H 

dkfzphtes3 539 CRL — CHYTSGNKGYIKQHLRVH 559 

Query f: 567 t: 587 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

CP+ C+ ++ +L+ HM+ H 
Query 567 CPI--CEHIADNSKDLESHMIHH 587 

33.47 (bits) f: 595 t: 616 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKt Fr rwsNLrRHMR . T . H* 

C+ C+++F ++S+LR+H R H 

dkfzphtes3 595 CKQ— CEESFHYKSQLRNHERE-QH 616 

Query f: 656 t: 676 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C++T ++ R+H+R+H 
Query 656 CDV— CDYTSTTYVGVRNHRRIH 676 

24.53 (bits) f: 684 t: 704 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMRTH* 
C+ CG++ +++ +L+ HM H 

dkfzphtes3 684 CSL--CGYVCSHPPSLKSHMWKH 704 

Query f: 809 t: 829 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C + CG ++++NL HM+ H 
Query 809 CCI--CGFESTSKENLLDHMKEH 829 
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DKFZphtes3_2fl4 



group: testes derived 

DKFZphtes3_2fl4 encodes a novel 129 amino acid protein with very weak similarity to human 
omega protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



weak similarity to omega protein 

complete cDNA, complete cds, 1 EST hit 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2353 bp 

Poly A stretch at pos. 2341, no polyadenylation signal found 



1 GCAGATTCTC CAGGCCCAGC ATCTGCCTCA CCGTGGCCCC CCACAAGCCA 
51 AGCGCCTGCC TTTCAGCAGC CTCTACACAC CCAGCTCCTG CCACCCAATG 
101 GCTCTTTAGG CCAAGCTCAT ACCTCACGAT GATTTTTCCA GGCCCAACTT 
151 TTGTCTCATG GCAACCTTCC CTGGCCAAGT TTCCACCTAT TTCCTGGCAG 
201 CCTGGACAGG CCCAGGTCCT GCCACACACT GGCCTCTCTA CGCCCAGCTC 
251 ATGCCTCACA GTGGCCTCTC CAGGCCCAGC TCCTGTCCCG GGACATCATC 
301 TCCAGGCCCA AAACTTCCTC AAGTCGGCCT CTCCAGGCCC AGTTGCTGCC 
351 TCCCGGCATT CTCTCCAGGC CTAGCTCTTC CTCCTGGCTG TATCTACAAG 
401 ACCAACTCCT GCCTCACAAC AACCTTTTAT GGCTCAGCTC CTGCCCAACT 
4 51 ACTGCCGGCC TTTGTAGGCC CAAAACTTCC TCAAGTCAAG CTCTTTAGGC 
501 CCACCTTCTG CCTTGCAGTG GCCTGTACAG ACCCAGCTCT GGCTTGAGAA 
551 CAGCCTCTGC AGGCCCTGCT CTTGCCTCTT AGCTCCCTCT CCAGGCCCAT 
601 CTCTTGCCTC ACAGTGGCTT CCGTGGGCCA AGTTCCCGCC TGCCTCCCAG 
651 CAGCCTCAAC AGGCCTAGCT CCTCCCTCAC AATGGCTTGT TTAGGTCCAG 
701 TTGATGCCTC TGGCAACCTG TCCAGGCCCA GCTCCTGCCT CACACTGGCC 
7 51 TCTCTAGGCC GAGGTCCTTT CTCATACTGG CCTGTTTAGG CCCAGCTCAT 
801 TCCTCTTGTC ATCTCTCCAG GCCCAGCTTT TGCCTGTTGT TGGCCTCTAC 
851 CTCACAGTGC ACCTTCCAGT CCCACCTCTT GCCTCACCAT GGCCTCCTCT 
901 GACCAGGTTC CTGCCTTTCG GCAGCCTCTA CAGGCCTAGC TGCTGCCTCC 
951 CAATGGCCTT TGTAGGCCAC GCTCATGCCT CACTGTGGCC TTTCCAGGCC 
1001 TAGCTTTCGC TTTTTGGCCA CTCCAGGCCC AGAACTTCCC CCAGTCAGCC 
1051 TCTCCAGGCC CAGCTCTTCC TCCCAGCAAC CTCTGCAGGC CCAAATCATC 
1101 CTCAAATTGG CCTCTTCTTT CCCAGCTCCT GCCTCCTGGT GGCCTCTGAA 
1151 GACCCAAATC GTCCTCCAGT TGGTTTTTCC AGGCCCAGCT CCTGCCTTTT 
1201 GGTGGCCTCT CCAGGTGCAA AACTTCCTCC CATCAGCCTG TCCAGGCCCA 
1251 GCTCATGCCT CTTGGTGGCC TTCTCAGGCC CTGCTTTTGA CTTGGTGGCC 
1301 TCTTCAGGCC CAGAACTTGA ACTCAAGTCA GCCTCTCCAG GCCCAGCTCC 
1351 TGCCTTCTTA AGGTCTGTAC AGGCCCAGCC TCTACCTCAC AGCGGACTCT 
1401 CCACACCCAG CTCTTGCCTC ACTGTAGCCT CCCCAGTCCA AAACTCCTGC 
14 51 CTTTTGGCAG CTTCGACAAG CCCAGCTCCT GCCTTTCAAT GACCTCTTTA 
1501 GGCCCCGCTC ATTCCTTACA ACGGCCTTTC CAGGCCCAGT TTTTCCCTTT 
1551 TGGCGGCCTC TCCAGGCCCA GAACTTCCTC AAGTCGGCCT CTTTAGGCCC 
1601 AGTTGCTGCC TCCTGGCATC CTCTGCAGGC CGAGCTCTTC CTCCCTGCTG 
1651 TGTCTACAGG CCCAACTCCT GCCTCACAAC AACCTCCTTG GACTCAGCTT 
1701 CTGCCCAGCT CCTGGTGGCC TTTGTAGGCT CAAAATTTTC TCAAATCAAG 
1751 CTCTCCAGGC CTACTGTCAG CCTCGTGGCA GCCTAAACAG GCCCAGCTCC 
1801 TGCCTGACAA TGGCCTCTCC AGGCTTTTCT CCTGCCTCGC AGCAGGCTTT 
1851 CCAGGCCCAG CTCTTGCCTC ATGGTGGCCT TCCCCGGCCA TGTTCCTATC 
1901 TGACTTCTGG CAGCCTCAAC CGGCCCAGCT TCTGCCTCAC ACTGGCCTCT 
1951 CTAGGCCCAG CTCCTTTTTC ACAGTGGCCT CACTACGCCC ATCTCCTACC 
2001 TCAGATCTGC CTCCCAAGAC CCAGCTCCTG TCTCATGGTG GTCTCTCTTA 
2051 CACCAGCTCC TGCCTCACAA TGGCCTCGTC TGGCCCATCT TCTGCCTCAC 
2101 AGTGGCCACT CAAGGCCCAT CTTTTGCCTC ATGGTAGCCT CTTCTGGTTT 
2151 TGCTCTTGCC TCACAGTTGC CTCTTCCAGA TCCAGCTTTA AGCCTTTGAT 
2201 GGTCAACAGC ATCAAGGAGC CTAAAGCTTC CCTGGACTCT CATTTGTTCA 
2251 CTTTACAGCA GAGTGCCTTA GCAAAAACTG TCTCTTAACC TTGAGAGTGG 
2301 ATTTCTGACA AATCGATAGT AAATTCTGCC TGTGTGGTTT CAAAAAAAAA 
2351 AAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 158 bp to 544 bp; peptide length: 129 
Category: similarity to known protein 



1 MATFPGQVST YFLAAWTGPG PATHWPLYAQ LMPHSGLSRP SSCPGTSSPG 
51 PKLPQVGLSR PSCCLPAFSP GLALPPGCIY KTNSCLTTTF YGSAPAQLLP 
101 AFVGPKLPQV KLFRPTFCLA VACTDPALA 

BLASTP hits 

Entry 170697 from database PIR: 
omega protein - human (fragment) 

Score =79, P - 2.8e-03, identities = 32/94, positives = 38/94 



Alert BLASTP hits for DKFZphtes3_2fl4, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2f 14, frame 2 



Report for DKFZphtes3_2f 14 . 2 



[LENGTH] 129 

[MWJ 13421.76 

[pi] 9.14 

[PROSITEJ MYRISTYL 2 

[KW] Irregular 

[KW) LOW_COMPLEXITY 10.85 % 



SEQ MATFPGQVST YFLAAWTGPGPATHWPLYAQLMPHSGLSRPSSCPGTSSPGPKLPQVGLSR 

SEG xxxxxxxxxxxxxx 

PRD cccccccceeehhhhhcccccccccccccccccccccccccccccccccccccccccccc 

SEQ PSCCLPAFSPGLALPPGCIYKTNSCLTTTFYGSAPAQLLPAFVGPKLPQVKLFRPTFCLA 

SEG 

-PRD cccccccccccccccccccccccccceeeccccccccccccccccccccccccccccccc 

SEQ VACTDPALA 

SEG 

PRD CCCCCCCCC 



Prosite for DKFZphtes3_2f 14 . 2 

PS00008 6->12 MYRISTYL PDOC00008 

PS00008 92->98 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2f 14 . 2) 
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DKF2phtes3_2g7 



group: testes derived 

DKFZphtes3_2g7 encodes a novel 359 amino acid protein with similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to neurofilament proteins 

complete cDNA, complete cds, 6 EST hits (5 hits are out of a testis 
library) 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1613 bp 

Poly A stretch at pos. 1595, polyadenylation signal at pos. 1557 

1 GCCACACAGG CTCCTTGGAG TAAGAGTGTG AGAAACTGGA TGAAGACAGC 

51 TGTATTCTTT TGGAAGCGTT CGAGATTGGT CTGTCTCTAC CAACTAAAAA 

101 CTTCTAGCTT AAGTGCAGAG ATTTAAGGAG ATCAACAAAA ACTCAGTCTA 

151 GACATATTAT GAGGCTGGGA GGGTATCAAC AGACTTGAGT TCTTGTCAGC 

201 AAGATCACCT GCTTTTAATA TTGTCCTCAG GGTCTGAGCA CATCTGGAAG 

251 TGAGGTCAAT CAAGTTAGAC CCCAAAAACT TTTGTGACAA CAGTGAAGAG 

301 GGGAAAATAA ACACACCACA AACATGAACC TCAACCCCCC GACATCTGCT 

351 CTTCAGATCG AGGGCAAAGG CAGCCATATT ATGGCTAGAA ATGTAAGCTG 

401 CTTTCTAGTC AGGCACACCC CTCATCCCAG AAGAGTCTGC CACATCAAAG 

451 GCTTGAATAA CATTCCAATC TGTACTGTGA ATGATGATGA GAATGCATTT 

501 GGAACATTGT GGGAAGTTGG CCAGTCTAAC TACTTAGAGA AGAACAGGAT 

551 ACCATTTGCC AATTGCAGTT ACCCCCCGAG CACTGCAGTC CAGAAGAGCC 

601 CTGTAAGAGG AATGTCGCCA GCCCCAAACG GTGCCAAAGT GCCTCCACGG 

651 CCTCATTCTG AGCCCAGTAG AAAAATTAAA GAGTGCTTCA AAACTTCCAG 

701 TGAGAATCCC TTAGTAATTA AAAAGGAAGA AATTAAGGCC AAAAGACCAC 

751 CATCACCTCC AAAGGCATGC TCTACTCCTG GCTCCTGTTC TTCAGGGATG 

801 ACAAGTACCA AGAATGATGT GAAAGCAAAC ACCATTTGCA TACCAAACTA 

851 TCTGGATCAG GAAATAAAAA TCCTGGCAAA GCTCTGTAGC ATTTTGCATA 

901 CTGATTCTCT GGCAGAAGTT TTACAGTGGC TGCTTCATGC AACTTCAAAA 

951 GAAAAAGAGT GGGTCTCAGC TTTGATTCAT TCTGAGCTTG CCGAGATAAA 

1001 CCTGTTAACT CATCACAGAA GAAACACCTC AATGGAACCA GCAGCAGAGA 

1051 CTGGGAAGCC ACCCACAGTT AAATCACCAC CCACAGTTAA ATTGCCCCCA 

1101 AATTTTACTG CAAAATCAAA AGTGCTGACC AGAGATACAG AAGGGGATCA 

1151 ACCAACCAGA GTGTCAAGTC AAGGATCTGA AGAAAACAAG GAAGTACCAA 

1201 AAGAGGCTGA GCACAAGCCT CCACTACTTA TAAGAAGAAA TAATATGAAA 

1251 ATACCTGTTG CAGAATATTT CAGCAAACCA AATTCTCCTC CCAGGCCTAA 

1301 CACTCAGGAG AGTGGATCAG CAAAACCAGT GTCAGCAAGG AGTATACAAG 

1351 AATACAACCT CTGTCCCCAA AGAGCATGTT ATCCTTCAAC ACACCGGAGG 

1401 TAGAAGTTCT AGACTGGGTG AATTCTTTCA TGAATATGAG CTTCACATTT 

1451 ACATCATCAA ATTATTTTTC AAATGAATAT TTTTGGTATT GAGGAATCAA 

1501 GTGGTCCTCT TTATGGTGGC ACATGTAAAT CTAAAAATAC CTGTATGTAA 

1551 TGCTACAAAT AAATATTACT GGAAATGATA TTTCCATTTG TAGTTAAAAA 

1601 AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 324 bp to 14 00 bp; peptide length: 359 
Category: similarity to known protein 
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1 MNLNPPTSAL QI EGKGSHIM ARNVSCFLVR HTPHPRRVCH IKGLNNIPIC 

51 TVNDOENAFG TLWEVGQSNY LEKNRIPFAN CSYPPSTAVQ KSPVRGMSPA 

101 PNGAKVPPRP HSEPSRKIKE CFKTSSENPL VIKKEEIKAK RPPSPPKACS 

151 TPGSCSSGMT STKNDVKANT ICIPNYLDQE IKILAKLCSI LHTDSLAEVL 

201 QWLLHATSKE KEWVSALIHS ELAEINLLTH HRRNTSMEPA AETGKPPTVK 

251 SPPTVKLPPN FTAKSKVLTR DTEGDQPTRV SSQGSEENKE VPKEAEHKPP 

301 LLIRRNNMKI PVAEYFSKPN SPPRPNTQES GSAKPVSARS IQEYNLCPQR 

351 ACYPSTHRR 



BLASTP hits 
Entry A43427 from database PIR: 

neurofilament triplet HI protein - rabbit (fragment) 

Score - 118, P - 5.6e-04, identities = 79/290, positives - 110/290 

Entry RNNFH_1 from database TREMBL: 

Rat high molecular weight neurofilament (NF-H) protein mRNA, 3' end. 
Score = 115, P » 9.5e-04, identities - 69/281, positives - 100/281 

Entry B43427 from database PIR: 

neurofilament protein H form H2 (repetitive region) - rabbit (fragment) 
Score - 111, P - 1.3e-03, identities - 64/269, positives = 102/269 



Alert BLASTP hits for DKFZphtes3_2g7, frame 3 
No Alert BLASTP hits found 

Pedant information for- DKF2phtes3_2g7, frame 3 



Report for DKFZphtes3_2g7 . 3 



[LENGTH] 


359 






39725.53 




[pi] 


9.45 




(PROSITE] 


MYRISTYL 3 




t PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


9 


[PROSITE] 


PKC PHOSPHO~SITE 


10 


[PROSITE] 


ASN_GLYCOSYLATION 


4 


[KWJ 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


4.18 % 



SEQ MNLNPPTSALQIEGKGSHIMARNVSCFLVRHTPHPRRVCHIKGLNNIPICTVNDDENAFG 

SEG 

PRD ccccccccceeecccccceeeeccceeeeecccccccccccccccccccccccccccccc 

SEQ TLWEVGQSNYLEKNRIPFANCSYPPSTAVQKSPVRGMSPAPNGAKVPPRPHSEPSRKIKE 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SEQ CFKTSSENPLVIKKEEIKAKRPPSPPKACSTPGSCSSGMTSTKNDVKANTICIPNYLDQE 

SEG 

PRD hcccccccceeeehhhhhhccccccccccccccccccccccccccccceeeeccccchhh 

SEQ IKILAKLCSILHTDSLAEVLQWLLHATSKEKEWVSALIHSELAEINLLTHHRRNTSMEPA 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

SEQ AETGKPPTVKSPPTVKLPPNFTAKSKVLTRDTEGDQPTRVSSQGSEENKEVPKEAEHKPP 

SEG . . . .xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeecccccccceeeeccccccccccccccccccc 

SEQ LLIRRNNMKI PVAEYFSKPNSPPRPNTQESGSAKPVSARSIQEYNLCPQRACYPSTHRR 

SEG 

PRD eeeeccccccceeeeecccccccccccccccccccchhhhhhccccccccccccccccc 



Prosite for DKF2phtes3_2g7 . 3 



PS00001 
PS00001 
PS00001 



23->27 
80->84 
234->238 



AS N_GL YC OS Y LAT I ON 
ASN_GLYCOSYLATION 
ASN GLYCOS YLAT ION 



PDOC00001 
PDOC00001 
PDOC00001 
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260- 


>2 64 


ASN GLYCOSYLATION 
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P5UUUU3 


115- 


■>1 18 


rRt- rriObrrlL) 
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P5UUUU3 


161- 




rKU_PHU5FHL>_ 
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PSUUUUO 


207- 


>2 10 


rfu* rnUbrMU 


SITE 


PDOC00005 


PSUOUUb 


243- 


>2 4 6 


Die r^PPrt^ pun' 


"site 


PDOC00005 


PSUUUU3 


248" 




PKC~PH0SPHO" 


"site 


PDOC00005 


r ouuuuj 


254- 


■>257 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


262- 


>265 


PKC~*PH0SPHO~ 


"site 


PDOC00005 


PS00005 


332- 


>335 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00005 


337- 


>340 


PKC~PHOSPHO" 


"site 


PDOC00005 


PS00005 


356- 


>359 


PKC~PHOSPHO~ 


"site 


P0OC00005 


PS00006 


51 


->55 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


61 


->65 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


124- 


>128 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


162- 


>166 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


195- 


>199 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


207- 


>211 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


235- 


>239 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


272- 


>276 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


340- 


>344 


CK2 PHOSPHO - 


site 


PDOC00006 


PS00008 


153- 


>159 


MYRISTYL 




PDOC00008 


PS00008 


158- 


>164 


MYRISTYL 




PDOC00008 


PS00008 


284- 


>290 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphtes3_2g7 . 3) 
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DKFZphtes3_2hl 



group: transmembrane protein 

DKFZphtes3_2hl encodes a novel 116 amino acid protein with weak similarity to C. elegans 
cosmid C13F10. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes and as a new marker for testicular cells. 



similarity to C. elegans C13F10.5 
TRANSMEMBRANE 1 
Sequenced by EMBL 
Locus: /map= M 2" 
Insert length: 1156 bp 

Poly A stretch at pos. 1143, polyadenylation signal at pos. 1121 



1 GGCCATCAAA ATAACTAAAC CATGTCATTT GGAGCAACAA AGCCACTGCG 
51 GCCTCCATTT GGGCCAAGCT CTGACTGCAA TGATGCCTCT GCCCCGACCC 
101 GGGCCTCGCT GTGACTGACA ATGCCGCTGC ATCTTTTCAG CAGTCATTGA 
151 TGAGGAAGTA TCTACATCCT CCTTCCCACT ACCAGATTTT GCTTGGAGAA 
201 AAGCAGTTTC CTGAAATAAT TCTGTGACGA GCTTCTTCCA CATTAGGACA 
251 AAAATGCTGG AAGCGGCTCA GCCCCAGGGC AGCACATCAG AGACACCATG 
301 GAACACAGCC ATTCCTCTGC CGTCGTGCTG GGACCAGTCT TTCCTGACCA 
351 ATATCACCTT CTTGAAGGTT CTTCTCTGGT TGGTCCTGCT GGGACTGTTT 
401 GTGGAACTGG AATTTGGCCT GGCATATTTT GTCCTGTCCT TGTTCTATTG 
451 GATGTACGTC GGGACACGAG GCCCTGAAGA GAAGAAAGAG GGAGAGAAGA 
501 GCGCCTACTC TGTGTTCAAT CCAGGCTGTG AAGCCATCCA GGGCACCCTG 
551 ACTGCAGAGC AGTTGGAGCG CGAGTTACAG TTGAGACCCC TGGCAGGGAG 
601 ATAGGACCCA GCTGTGCTGT CATGCAGCTA ACCTCTGATG TGGTCTTCCT 
651 CACCATTGGC TATGGATTTG ATTTCAGGTG TATAGGACTA AGGGCAGCTT 
701 GCGGGTTAGC TCTGTGACTG CATAGTTTTT CTACCTTCTT TCCCTGATCT 
751 TTTGCTGCCA TTTGATCTTT GATAGTTTTG GTGAAACTCT CTAAAATACA 
801 TTCACTGTGG GTCCGACGCA ATTTATAAAA ATTATGTACT CAAGAAGGGA 
851 GACCTGTTTG TTTCATTTCT CATCTGTTTG GGAGATGATT TTAGAGCACT 
901 AGAAAGGCAC TGGGGAGATT CTCAGCTTAA AACATCCAGC AGTTTGAAGT 
951 ATGATTAGGT ACATCAGGGC TGCATTGTCA ATGTTCTCTT TAAGTCTTTT 
1001 AACATTTATA GCAATTTTTT TTTTCCCGGA GAGTTTAGGT TGCAAGTTTT 
1051 GGGTTTCTTG TTTGTTTTTG TTTTGCTTCC TGCTTTAATT CTTTAATTTT 
1101 CAGTCATTAC TGGTATTGAA AAATAAAATA TCTTTAAAAC ATCAAAAAAA 
1151 AAAAAA 



BLAST Results 



Entry HS313307 from database EMBL: 
human STS SHGC-16715. 
Score - 1222, P - 1.4e-48, identities = 248/251 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 254 bp to 601 bp; peptide length: 116 
Category: similarity to unknown protein 



1 MLEAAQPQGS TSETPWNTAI PLPSCWDQSF LTNITFLKVL LWLVLLGLFV 
51 ELEFGLAYFV LSLFYWMYVG TRGPEEKKEG EKSAYSVFNP GCEAIQGTLT 
101 AEQLERELQL RPLAGR 
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BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_2hl, frame 2 

TR£MBL:CEUC13F10_2 gene: "C13F10.5"; Caenorhabditis elegans cosittid 
C13F1C, N ~ 1, Score - 141, P » 8.2e-10 

>TREMBL:CEUC13F10_2 gene: "C13F10.5**; Caenorhabditis elegans cosmid 
C13F10. 

Length - 171 

HSPs: 

Score = 141 (21.2 bits), Expect - 8,2e-10, P - 8.2e-10 
Identities = 32/82 (39%), Positives = 52/82 (63%) 

Query: 27 DQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFVLSLFYWMYVGTRGPEEKKEGEKSAYS 86 

+QS ++ T + V++++V L ++FG +F+LSL + Y T G ++ GE SAYS 
Sbjct: 90 EQSVVS — TRI AWV Y V VGQAL AAWVQ FGAV F F I LS L I L FT Y WNT -G RRRRG EMS A YS 144 

Query: 87 VFNPGCEAIQGTLTAEQLEREL 108 

VFN CE + G++TAE ER++ 
Sbjct: 145 VFNDNCERLAGSMTAEHFERDM 166 

Pedant information for DKFZphtes3_2hl, frame 2 

Report for DKFZphtes3_2hl . 2 



[LENGTH] 

IMW] 

(pi] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITEJ 

(KW] 

[KW] 



116 

13092.19 
4 .64 

MYRISTYL 1 
CK2_PHOSPHO_SITE 
TYR_PHOSPH0_SITE 
ASN_GLYCOSYLATION 
TRANSMEMBRANE 1 
LOW COMPLEXITY 



2 
2 
1 

32.76 % 



SEQ MLEAAQPQGSTSETPWNTAI PLPSCWDQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFV 

SEG xxxxxxxxxxxxxxxxxxxxx . . . . 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhchhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ LSLFYWMYVGTRGPEEKKEGEKSAYSVFNPGCEAIQGTLTAEQLERELQLRPLAGR 

SEG xxxxxxxxxxxxxxxxx. . 

PRD hhhhhhhhcccccchhhhhcccceeeecccccccccccchhhhhhhhhhccccccc 

MEM 



Prosite for DKFZphtes3_2hl . 2 

PS00001 33->37 ASN_GLYCOSYLATION PDOC00001 

PS00006 10->14 CK2_PHOSPHO SITE PDOC00006 

PS00006 24->28 CK2 PHOSPHO~SITE PDOC00006 

PS00007 78->86 TYR~PHOSPHO_SITE PDOC00007 

PS00007 77->86 TYR PHOSPHO_SITE PDOC00007 

PS00008 97->103 MYRISTYL PDOC0000B 



(No Pfam data available for DKFZphtes3_2hl . 2 ) 
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DKFZphtes3_2hl5 



group: testes derived 

DKF2phtes3 2hl5 encodes a novel 855 amino acid protein with very weak similarity to S. porabe 
cdc23. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 

similarity to cdc23 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4619 bp 

Poly A stretch at pos. 4598, polyadenylation signal at pos. 4589 

1 GAAGGCGTCC CGGCATCGGC CAAGATTCTA CATTGCTCAT CTGGGCATCT 
51 GAGCCTCCTT CGAAGTTTCC TGTCACAACT GTCCTCTTGA CAGCATGGAT 

101 GAGGAGGAAG ACAATCTGTC TCTGCTGACC GCACTGCTGG AAGAAAATGA 

151 GTCAGCCTTG GATTGTAATT CAGAAGAAAA TAACTTCTTG ACGCGGGAAA 

201 ATGGCGAGCC CGACGCATTT GATGAGCTCT TTGATGCCGA CGGCGACGGT 

251 GAATCTTATA CAGAAGAGGC TGATGATGGA GAAACAGGAG AGACAAGAGA 

301 CGAAAAGGAA AATCTGGCCA CTCTCTTTGG AGATATGGAG GACTTAACAG 

351 ATGAAGAAGA AGTTCCCGCA TCACAGTCAA CTGAAAATAG GGTCCTCCCT 

401 GCTCCTGCCC CCAGGCGAGA GAAAACGAAT GAAGAGTTGC AAGAGGAATT 

451 AAGGAATTTG CAAGAGCAAA TGAAGGCCTT ACAAGAGCAG CTAAAAGTAA 

501 CAACAATTAA ACAGACAGCA AGCCCAGCCC GTCTGCAAAA ATCCCCTGAG 

551 AAGTCTCCCC GGCCACCTCT TAAGGAGAGG AGAGTTCAGA GAATTCAGGA 

601 GTCAACATGC TTTTCTGCGG AGCTTGATGT CCCTGCGCTA CCAAGAACCA 

651 AGAGGGTGGC TCGAACACCA AAGCCTTCAC CTCCAGATCC CAAAAGCTCA 

701 TCTTCAAGGA TGACAAGTGC ACCCTCCCAA CCCCTACAGA CGATTTCTCG 

751 GAACAAACCT AGTGGGATAA CTAGAGGTCA AATTGTGGGG ACCCCAGGAA 

801 GTTCTGGGGA AACGACTCAA CCCATCTGTG TGGAAGCCTT CTCTGGTCTG 

851 CGGCTCAGGC GGCCTCGAGT ATCCTCCACA GAAATGAACA AGAAAATGAC 

901 CGGCCGAAAA CTGATCAGAC TGTCTCAGAT CAAGGAAAAG ATGGCCAGAG 

951 AGAAGCTGGA AGAAATAGAT TGGGTGACAT TTGGGGTTAT ATTGAAGAAG 
1001 GTTACGCCAC AGAGTGTGAA TAGTGGAAAA ACCTTCAGCA TATGGAAACT 
1051 GAATGATCTT CGTGACCTGA CACAATGTGT GTCCTTGTTC TTATTTGGAG 
1101 AAGTTCACAA AGCGCTCTGG AAGACGGAGC AGGGGACTGT CGTAGGGATC 
1151 CTCAATGCCA ACCCCATGAA GCCCAAGGAT GGTTCAGAGG AGGTGTGTTT 
1201 ATCTATCGAT CATCCTCAGA AGGTCTTAAT TATGGGTGAA GCTCTTGACC 
1251 TGGGAACCTG TAAAGCCAAG AAGAAGAATG GAGAGCCGTG CACGCAGACT 
1301 GTGAATTTGC GTGACTGTGA GTACTGTCAG TACCATGTCC AGGCTCAGTA 
1351 CAAGAAGCTC AGTGCAAAGC GTGCGGATCT GCAGTCCACC TTCTCTGGAG 
1401 GACGAATTCC AAAGAAGTTT GCCCGCAGAG GCACCAGCCT CAAAGAACGG 
1451 CTGTGCCAAG ATGGCTTTTA CTACGGAGGG GTTTCTTCTG CCTCGTATGC 
1501 AGCTTCAATT GCAGCAGCTG TGGCTCCTAA GAAGAAGATT CAAACCACTC 
1551 TGAGTAATCT GGTTGTTAAG GGCACAAACT TGATCATCCA GGAAACACGG 
1601 CAAAAACTCG GAATACCCCA GAAGAGCCTG TCTTGCTCTG AGGAGTTCAA 
1651 GGAACTGATG GACCTGCCGA CGTGTGGAGC CAGGAACTTA AAACAACATT 
1701 TAGCCAAAGC CTCAGCTTCA GGGATTATGG GGAGCCCAAA ACCAGCCATC 
1751 AAGTCCATCT CGGCCTCAGC ACTCTTGAAG CAACAGAAGC AGCGGATGTT 
1801 GGAGATGAGG AGAAGGAAAT CAGAAGAAAT ACAGAAGCGA TTTCTGCAGA 
1851 GCTCAAGTGA AGTTGAGAGC CCAGCTGTGC CATCTTCATC AAGACAGCCC 
1901 CCTGCTCAGC CTCCACGGAC AGGATCCGAG TTCCCCAGGC TGGAGGGAGC 
1951 CCCGGCCACA ATGACGCCCA AGCTGGGGCG AGGTGTCTTG GAAGGAGATG 
2001 ATGTTCTCTT TTATGATGAG TCACCACCAC CAAGACCAAA ACTGAGTGCT 
2051 TTAGCAGAAG CCAAAAAGTT AGCTGCTATC ACCAAATTAA GGGCAAAAGG 
2101 CCAGGTTCTT ACAAAAACAA ACCCAAACAG CATTAAGAAG AAACAAAAGG 
2151 ACCCTCAGGA CATCCTGGAG GTGAAGG^AC GTGTAGAAAA AAACACCATG 
2201 TTTTCTTCTC AAGCTGAGGA TGAATTGGAG CCTGCCAGGA AAAAAAGGAG 
2251 AGAACAACTT GCCTATCTGG AATCTGAGGA ATTTCAGAAA ATCCTAAAAG 
2301 CAAAATCAAA ACACACAGGC ATCCTGAAAG AGGCCGAGGC TGAGATGCAG 
2351 GAGCGCTACT TTGAGCCACT GGTGAAAAAA GAACAAATGG AAGAAAAGAT 
2401 GAGAAACATC AGAGAAGTGA AGTGCCGTGT CGTGACATGC AAGACGTGCG 
24 51 CCTATACCCA CTTCAAGCTG CTGGAGACCT GCGTCAGTGA GCAGCATGAA 
2501 TACCACTGGC ATGATGGTGT GAAGAGGTTT TTCAAATGTC CCTGTGGAAA 
2551 CAGAAGCATC TCCTTGGACA GACTCCCGAA CAAGCACTGC AGTAACTGTG 
2601 GCCTCTACAA ATGGGAACGG GACGGAATGC TAAAGGTATG CCATTTGCGT 
2651 ACTAATTTTT GACTCCTTTT AGTGACCCAT GCTAATAATG TGGAACCATC 
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2701 TCCTATTAAA ATATTTTCAT TTTTCTAGGA 
2751 GAGGAGAAAC TCTGTTACCA AGAGGAGAAG 
2801 AGCCTTAAAT AACCCGAACT TCAGACATTT 
2851 CCTGTGACTC TGGAAAGCAA AGGATTGGCT 
2901 TGATTGACGC CGTCAAAAAC AAATGCTTGT 
2951 GCTTACTTTC TGCCATTGGG TTGGTTTGAT 
3001 TTAAGTGGAA AACCAAGTTA TCATTGTCTT 
3051 ATTGCATTAC TTCATTCACT GAAGTTTTTG 
3101 CAGAGAGCTA TGTTTCTGTA TCTTTTGGTT 
3151 TCATAACAAA ATTCTAGTGT TTATACGAAC 
3201 GGCTTAATTC TCACTCCAGG TAAGTAGCTT 
3251 CTCATCTGTA AAATCAGGAA GATTGGACTA 
3301 TTTAGCACTG GATTTCTACA AATAATAAAA 
3351 TGATCACATA GTCTTGATGT ACGGACATTA 
3401 AATTCTGTTA TCTCTGTTTT ACTCTTTGAA 
3451 ACTTTGCATT TCAGTTTATA TATAGAGAGA 
3501 TACATTATTG TGGAGCCCTG TGATAGAAAT 
3551 TTTTTTTTAA TTTTTTTATT TTTTATGACA 
3601 GGCTGGAGTG CAGTAGTGCG ATCGCGGCAC 
3651 GGGCTCAAGC AGTCCTCCCA CCTCAGTCTC 
3701 GCGTGCGTGA CCAAGCCCAG CTAATTTTTG 
3751 TTTTGCCATG TTGCTCAGGC TGGTCTCAAA 
3801 ACCCACCTCT GTTTCCAAAA AAAAAAAAAA 
3851 GCAAATTACC ACAGCAAAGG TTTCATTCAG 
3901 ACCTGGTTTT CCAAATATCA TTTGACCTAA 
3951 AAGATTGGGT AAATTGGTTG AATTATTGTA 
4001 AAAAGTAATT TAGGTTTCCC CTAAGATGTT 
4051 CTTTTGGGAG GTTGTTGTGG GAGATGG T TG 
4101 GAAATAAAAT TTACATGCCT TAGATTTCAT 
4151 TGGAAGGTGC TGTATCTAAC TTGTGTTCCT 
4201 ACTATTCTTT TAGGAGTATA CTTCTACTTT 
4251 TTAATTTTTT CTAACAAAGA AAAGAATAAA 
4301 GAAAGCACTT GAAACTGATG TTTTTAATGG 
4351 TTATCTCATT AACTTAAAAC AGCTATGTGT 
4401 ACTTGAACAC CAGGTTGGTG TCTGAGCAAT 
4451 AATGTTCTTG TTTGAACAGA GGGTATCATT 
4501 TATTGTTATA TAAGTTGTAT AATATGCTTG 
4551 TATCTGGATG CCTTTTTACA ATTTGATTTT 
4601 AACATAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 95 bp to 2659 bp; peptide length: 855 
Category: similarity to known protein 
Classification: Cell division 



1 MDEEEDNLSL LTALLEENES ALDCNSEENN FLTRENGEPD AFDELFDADG 
51 DGESYTEEAD DGETGETRDE KENLATLFGD MEDLTDEEEV PASQSTENRV 
101 LPAPAPRREK TNEELQEELR NLQEQMKALQ EQLKVTTIKQ TASPARLQKS 
151 PEKSPRPPLK ERRVQRIQES TCFSAELDVP ALPRTKRVAR TPKPSPPDPK 
201 SSSSRMTSAP SQPLQTISRN KPSGITRGQI VGTPGSSGET TQPICVEAFS 
251 GLRLRRPRVS STEMNKKMTG RKLIRLSQIK EKMAREKLEE IDWVTFGVIL 
301 KKVTPQSVNS GKTFSIWKLN DLRDLTQCVS LFLFGEVHKA LWKTEQGTVV 
351 GILNANPMKP KDGSEEVCLS IDHPQKVLIM GEALDLGTCK AKKKNGEPCT 
401 QTVNLRDCEY CQYHVQAQYK KLSAKRADLQ STFSGGRIPK KFARRGTSLK 
451 ERLCQDGFYY GGVSSASYAA SIAAAVAPKK KIQTTLSNLV VKGTNLIIQE 
501 TRQKLGIPQK SLSCSEEFKE LMDLPTCGAR NLKQHLAKAS ASGIMGSPKP 
551 AIKSISASAL LKQQKQRMLE MRRRKSEEIQ KRFLQSSSEV ESPAVPSSSR 
601 QPPAQPPRTG SEFPRLEGAP ATMTPKLGRG VLEGDDVLFY DESPPPRPKL 
651 SALAEAKKLA AITKLRAKGQ VLTKTNPNSI KKKQKDPQDI LEVKERVEKN 
701 TMFSSQAEDE LEPARKKRRE QLAYLESEEF QKILKAKSKH TGILKEAEAE 
751 MQERYFEPLV KKEQMEEKMR NIREVKCRVV TCKTCAYTHF KLLETCVSEQ 
801 HEYHWHDGVK RFFKCPCGNR SISLDRLPNK HCSNCGLYKW ERDGMLKVCH 
851 LRTNF 



AAAGACTGGT 
AACATGCTAA 
TCCCACAGAC 
GTGTATTGTC 
TAAGCCCATA 
ACCACATTTA 
TTCTAAGCTC 
CCCAAAAATT 
ATAGAGTGTT 
ACCCAGAGGC 
AACTTCTGGG 
AGTGATCCTG 
CTTTCCCATC 
AAAGCCAGAT 
ATTGATCAAG 
GAAAGAAGGC 
ATGTAAAATC 
GGGTCTCACT 
ACTGCAGCCT 
CCAAATAGCT 
CATTTTTTGT 
CTCCTGAGCA 
AATGAAAGGT 
GAGATTCTTC 
GTGAATGTTG 
TTGAAGCTTG 
ATTATGTTAG 
ATTTAGGTTT 
AAAATTCTGC 
CCTAAGGTTA 
ATAGAAGGTT 
GTATTTATTA 
CTCATTTAGG 
ATGAAATAGG 
CCCTTTCTTA 
GCAGTCAGTA 
TAAAGGCTGA 
AACTTTTAAA 



CCAAAGATAG 
ATTTCTGAAC 
TTCCTGGCCT 
CATTGATTCC 
AGCTTTGCCT 
ACATTGACAT 
AGTGTGGATG 
GGAAGGTAAA 
CACTTCTTTA 
AAAAGAATTT 
CTTCAGTTTT 
AAATGTATTT 
TAGATAATGA 
TTCTTCATTC 
CCACTGAATC 
TGTCTGCTCT 
TCATATTATT 
ATGTCACCCT 
TGGCTTCCCT 
AGGACTACAG 
AGAGATGGGG 
CTAGCAATCC 
CAACCCCTAT 
CATCTGGGCA 
ATACTAGCTA 
AGCTGTAGCT 
GGACATAACA 
TCAAAAGCTA 
TCTAATTGGG 
TGTCCTAATA 
GCTTTTCTTT 
ATAAGAACCA 
GTAGATTTAT 
TCACAACAGA 
TGGGAAAAAC 
TTCACGTGTA 
GGGTGAGCTG 
ATAAATTTAA 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl5, frame 2 

TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10" ; product: "cell 
division cycle protein 23"; S.pombe chromosome II cosmid cl347., N = 
2, Score = 284, P - 7e-21 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae), N - 2, 
Score = 203, P = 7e-12 

TREMBL : SCDNA52A_1 gene: "DNA52"; Saccharomyces cerevisiae DNA52 gene, 
complete cds. , N - 2, Score = 201, P » 7.9e-12 

TREMBLNEW:AC006234_6 gene: "F5H14.6"; Arabidopsis thaliana chromosome 
II BAC F5H14 genomic sequence, complete sequence., N = 2, Score - 211, 
P - 1.7e-15 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae), N ■ 2, 
Score = 203, P - 7.2e-12 

>TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid C1347. 
Length - 593 

HSPs: 

Score « 284 (42.6 bits). Expect = 7.0e-21, Sum P(2) - 7.0e-21 
Identities = 97/383 (25%), Positives = 186/383 (48%) 

EKTNEELQEELRNLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQ 1 68 
E+ + +L+E + LQ Q+ +QE+ ++ + ++ AS + + PR P ++ RV + 
EENDLDLEE— KRLQRQLNEIQEKKRLRSAQKEASSENAEVI— QVPRSPPQQVRVLTVS 63 

ESTCFSAE LDVPALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQP LQTIS 218 

+ + L + K V+ P P PK R+ A +Q L+T+ 



p+ C ++ +S 



+ Q+ + + K E E+D +V G++ T ++VN K + + L DL+ 



Query: 


109 


Sbjct: 


8 


Query: 


169 


Sbjct: 


64 


Query: 


219 


Sbjct: 


124 


Query: 


276 


Sbjct: 


184 


Query: 


332 


Sbjct: 


240 


Query: 


390 


Sbjct: 


300 


Query: 


450 


Sbjct: 


354 


Score 


= 41 


Identities = 


Query: 


453 


Sbjct: 


465 


Score 


- 40 


Identities « 


Query: 


536 


Sbjct: 


481 



WK + GTV+ +LN +KPK+ L +D VL+ +G + LG C 



+++K+GE C ++ R + C+YHV ++ + R + S+ + 



+QT 



Expect = 7.0e-21, Sum P(2) - 7.0e-21 
%), Positives - 17/43 (39%) 



S AS A++ K + SN + GTN 



5.0 bits), Expect » 8.9e-21, Sum P(2) " 8.9e-21 
13/26 (50%), Positives - 18/26 (69%) 



LA +AS IM +PK ++ S S SA+L 
LASFNAS-IM-NPKSSLPSFSNSAIL 504 

Pedant information for DKFZphtes3_2hl5, frame 2 

Report for DKFZphtes3_2hlS . 2 
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I LENGTH] 855 

IMWJ 96135.01 

[pi] 8.96 

[HOMOL] TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10"; product: "cell division 

cycle protein 23"; S.pombe chromosome II cosmid cl347. 5e-16 

[FUNCAT] 03.22 cell cycle control and mitosis IS. cerevisiae, YILlSOc] le-11 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YILlSOc] le-11 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YILlSOc] le-11 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 12.05 % 

[KW] COILED_COIL 4.21 % 

SEQ MDEEEDNLSLLTALLEENESALDCNSEENNFLTRENGEPDAFDELFDADGDGESYTEEAD 

SEG xxxxx 

PRD cccchhhhhhhhhhhhhhhhccccccccceeeeccccccccceeeecccccccceeeeec 

COILS 

SEQ DGETGETRDEKENLATLFGDMEDLTDEEEVPASQSTENRVLPAPAPRREKTNEELQEELR 

SEG xxxxxxxxxxxx xxxxxxxxx 

PRD cccccccccccchhhhhhcccccccceeeccccccccccccccccccchhhhhhhhhhhh 

COILS CCCCCCCCCCCCCC 

SEQ NLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQESTCFSAELDVP 

SEG xxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccceeeeecccccccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCC 

SEQ ALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQPLQTISRNKPSGITRGQI VGTPGSSGET 

SEG xxxxxxxxxxxxx 

PRD cccccceeeecccccccccccchhhhhhhccccchhhhhhccccccceeeeecccccccc 

COILS 

SEQ TQPICVEAFSGLRLRRPRVSSTEMNKKMTGRKLIRLSQIKEKMAREKLEEIDWVTFGVIL 

SEG 

PRD cccccccccchhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeeeee 

COILS 

SEQ KKVTPQSVNSGKTFSIWKLNDLRDLTQCVSLFLFGEVHKALWKTEQGTVVGILNANPMKP 

SEG 

PRD cccccccccccceeeeeeeccchhhhhhheeeeecchhhhhhhhccceeeeecccccccc 

COILS 

SEQ KDGSEEVCLSIDHPQKVLIMGEALDLGTCKAKKKNGEPCTQTVNLRDCEYCQYHVQAQYK 

SEG 

PRD ccccceeeeecccccceeeccccccccccccccccccccceeecccccccchhhhhhhhh 

COILS 

SEQ KLSAKRADLQSTFSGGRIPKKFARRGTSLKERLCQDGFYYGGVSSASYAASIAAAVAPKK 

SEG xxxxxxxxxxxxxxxxxxx. . . 

PRD hhhhhhhhhhhhccccccccccccccchhhhhhhccccccccccchhhhhhhhhhhhcch 

COILS 

SEQ KIQTTLSNLVVKGTNLI IQETRQKLGI PQKSLSCSEEFKELMDLPTCGARNLKQHLAKAS 

SEG 

PRD hhhhhhheeecccceeeehhhhhhhcccccccchhhhhhhhhhccccccchhhhhhhhhh 

COILS 

SEQ ASGIMGSPKPAIKSISASALLKQQKQRMLEMRRRKSEEIQKRFLQSSSEVESPAVPSSSR 

SEG XXXXXXXXXXXXXXX 

PRD hhcccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 

COILS 

SEQ QPPAQPPRTGSEFPRLEGAPATMTPKLGRGVLEGDDVLFYDESPPPRPKLSALAEAKKLA 

SEG XXXXXXXX XXXXXXXXXXXX 

PRD ccccccccccccccccccccccccccccccccccceeeeeccccccchhhhhhhhhhhhh 

COILS 

SEQ AITKLRAKGQVLTKTNPNSIKKKQKDPQDILEVKERVEKNTMFSSQAEDELEPARKKRRE 

SEG xxxxx 

PRD hhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhh 

COILS 

SEQ QLAYLESEEFQKILKAKSKHTGILKEAEAEMQERYFEPLVKKEQMEEKMRNIREVKCRW 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheee 

COILS 

SEQ TCKTCAYTHFKLLETCVSEQHEYHWHDGVKRFFKCPCGNRSISLDRLPNKHCSNCGLYKW 
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SEG * 

PRD eeecceeeeeeecccceeeccccccccceeeeeecccccccccccccccccccccceeec 

COILS 

SEQ ERDGMLKVCHLRTNF 

SEG 

PRD ccccccccccccccc 
COILS 

(No Prosite data available for DKFZphtes3_2hl5.2) 
(No Pfam data available for DKFZphtes3_2hl5.2) 
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DKFZphtes3_2i5 



group: testes derived 

DKFZphtes3_2i5 encodes a novel 151 amino acid protein with weak similarity to, C.elegans 
cosmid F20D12.3 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to C.elegans F20D12.3 

many ATGs in front of the start of the ORF, 
unspliced intron in 5' region? 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2142 bp 

Poly A stretch at pos. 2121, polyadenylation signal at pos. 2102 

1 GCAGTAAATA TGATATGAAA GAATTCTCTA ACTTGGGGGT GGCTTGTAAC 
51 CTGTAATAAA AATATTGCTA AAATACCTTC TCTCACTTTG AAAAAGCATC 

101 TGAGCAATCC TCAGTTATTG GTGAATTCTT ACCAGTGTTT AATTCCTCTC 

151 TTTCCGTTAT GGTCTTAGTG TGGTTGTCCT GGTGTAGTAT TTCAAGAGGA 

201 ACCTGCAGCA AGATGAAAAG AGAGTGGGAC TTGGAGCTAA GAACGTTTTT 

251 GGCTTTAAGT GCTACGTTAA CTCATTAAAT TCTTAGTGAT CTTGGGGAAG 

301 TCCCCTCACC AGTGTGAGCC TCAGTTTTCT TATCTAATAA GTAAGGATAA 

351 TCTTACCCAC CTTATTGCGG GGGCCCGAGG ATTACATGAT TGGTGTAACA 

401 GTAGCACCTT GTACATTTGA AAGGACTAAT ACCAGTGGAC TTTAACCTTG 

451 GCTGGGCTTT GGAATTCTTG GTGGGACTTT TTAATCATGT AGATTCTCAG 

501 GCCCCTGCCT GGCCTGTGGA ACCACAGACT CTATAGGTGG GCCCTTCCAG 

551 AAGGCCTCAT GGGTGGTTCT CATGTGGAAC CTGTGTTGCA AGCCACTGCA 

601 TGGTGTTACT GCTATTAACA TTAAAACTTA TATTTTCCTT ATTGTGTGGA 

651 TATATCTGTG GTGTTTGCCC ATGTATACTT CATTTTACAT TTCTTAAAGA 

701 ATAGAATGGA ATGGTTTTAA GCACGCTACA TTGTCCAGGT TATACCCACA 

751 GAAGAGCTGT TGTGTAACAG AATCAGCATC ATACCTGAAT CATTTGTACA 

801 TTGCATATAA GACTATGTCT AAGTAGAAGA TGCTATGAAA TCATGTCTGC 

851 TGTGGGGCCA GGCATAATTA TGAATGTTAC TTAAGAGCAT AGGTGAGGTG 

901 AGAAAAGGGA ATGTGACTAG TGTTTTAGTA TTTTCTTGGT GTGGGATGAA 

951 GTATAATTCT tttt ttTTTT TCTCAACAAA GCAGTAAAAC TAGAAAGAAG 
1001 GAGAACTCTT CCCTCAAGAA TGGCTGTACC TTCATATCTA GAGGCACATT 
1051 AAAAAAAAGA ACGTCTGTAC CTTAAAAATG GAGGTCATTT CATTGTGTTC 
1101 ATTTTCAAGG TTGTTGTATG GCTCGGTCAG AACTTTCTGT TACCAGAAGA 
1151 CACTCACATT CAGAATGCTC CATTTCAAGT GTGTTTCACA TCTTTACGGA 
1201 ATGGCGGCCA CCTGCATATA AAAATAAAAC TTAGTGGAGA GATCACTATA 
1251 AATACTGATG ATATTGATTT GGCTGGTGAT ATCATCCAGT CAATGGCATC 
1301 ATTTTTTGCT ATTGAAGACC TTCAAGTAGA AGCGGATTTT CCTGTCTATT 
1351 TTGAGGAATT ACGAAAGGTG CTAGTTAAGG TGGATGAATA TCATTCAGTG 
1401 CATCAGAAGC TCAGTGCTGA TATGGCTGAT CATTCTAATT TGATCCGAAG 
14 51 TTTGCTGGTC GGAGCTGAGG ATGCTCGTCT GATGAGGGAC ATGAAAACAA 
1501 TGAAGAGTCG TTATATGGAA CTCTATGACC TTAATAGAGA CTTGCTAAAT 
1551 GGATATAAAA TTCGCTGTAA CAATCACACA GAGCTGTTGG GAAACCTCAA 
1601 AGCAGTAAAT CAAGCAATTC AAAGAGCAGG TCGTCTGCGG GTTGGAAAAC 
1651 CAAAGAACCA GGTGATCACT GCTTGTCGGG ATGCAATTCG AAGCAATAAC 
1701 ATCAACACAC TGTTCAAAAT CATGCGAGTG GGGACAGCTT CTTCCTAGGT 
1751 GAGGAAAATA CAGGTCATGA AGTTCCTGGC AAAGATTTTC TGTTAAAAAC 
1801 CTATGCTGGT TTGCTTTGGA TCACACCCTG GTGAACCCCG GGTGCTAAGA 
1851 ATGAAAATAA CCTTGGTGAG TTGTACAAAT TAAAGACAAA GAACTACATG 
1901 TGAAGATAGA CTTGCTTTCT ATTTTTAAAT CAGTAGTAGT ACTGTTGCTG 
1951 AATAATACTA GGTTTTTATG GAATAGGATG AATGCTTTTG AAGTATTAGG 
2001 GCTTCAGAGT CCAATTTTGC TTATTTATGG TATATAAATA CATATTTTTT 
2051 TCTTGAAATT GCAATTGAGT TTGTACTTTT CAAATAGATT ATCTACTTTT 
2101 TCATTAAAAT GTAAAGATGT TAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 1293 bp to 1745 bp; peptide length: 151 
Category: similarity to unknown protein 
Classification; no clue 

1 MASFFAIEDL QVEADFPVYF EELRKVLVKV DEYHSVHQKL SADMADHSNL 

51 IRSLLVGAED ARLMRDMKTM KSRYMELYDL NRDLLNGYKI RCNNHTELLG 

101 NLKAVNQAIQ RAGRLRVGKP KNQVITACRD AIRSNNINTL FKIMRVGTAS 

151 S 



BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_2i5, frame 3 

TREMBL: CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid 
F20D12., N - 1, Score - 173, P = 4.5e-12 



>TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 
Length - 699 

HSPs: 

Score = 173 (26.0 bits), Expect = «i.5e-12, P - 4.5e-12 
Identities ~ 33/130 (25%), Positives = 72/130 (55%) 

20 FEELRKVLVKVDEYHSVHQKLSADMADHSNLIRSLLVGAEDARLMRDMKTMKSRYMELYD 79 
F+E ++L ++D V +L+A++ + ++ +++ AED+ + ++ + Y+ L 
569 FKEADEILEEIDPMTEVRDRLTAELQERQAAVKEIIIRAEDSIAIDNIPDARKFYIRLKA 628 

80 LNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKPKNQVITACRDAIRSNNINT 139 
+ ++R NN + +L+ +N+ 1+ RLRVG+P Q++ +CR AI +N 

629 NDAAARQAAQLRWNNQERCVKSLRRLNKI IENCSRLRVGEPGRQI VVSCRSAIADDNKQI 688 

140 LFKIMRVGTA 149 

+ KI++ G + 
689 ITKILQYGAS 698 



Pedant information for DKFZphtes3_2i5, frame 3 



Report for DKFZphtes3_2i5 . 3 



151 

17304 .07 
9.33 

TREMBL: CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 2e 
Alpha_Beta 



SEQ MAS FFA I EDLQVEADFPVYFEELRKVLVKVDEYHSVHQKLSADMADHSNL IRSLLVGAED 

PRD ccceeeehhhhhhccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ ARLMRDMKTMKSRYMELYDLNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKP 

PRD hhhhhccchhhhhheeeccchhhhhhheeeeeccchhhhhhhhhhhhhhhhhcccccccc 

SEQ KNQVITACRDAIRSNNINTLFKIMRVGTASS 

PRD cceeeeehhhhhhcccceeeeccceeecccc 



(No Prosite data available for DKFZphtes3_2i5.3> 
(No Pfam data available for DKFZphtes3_2i5 . 3) 
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DKFZphtes3_2119 



group: testes derived 

DKF2phte33_2119 encodes a novel 166 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

complete cDNA, complete cds, no EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1079 bp 

Poly A stretch at pos. 1053, polyadenylation signal at pos. 1038 



1 CCACAGGACA CACTGTTCCC AGGGCACAGA CACCCTGGGC TTTGGTTGGG 
51 TCTTGGCCTC CAGGTAGGGC CCTGTTGGGC AGCGGGCAGC AACTCCTGAG 
101 ACACTACTGT GATTCTTGGT GGTGGCTGTG GTAAAAAACC TGCAGGGCTA 
151 GAGTTTGGGG TGAGATTCAG CAGTAACTGT GGCCTCTCCT AGTGACAGTA 
201 TGTCACTCCC ACTCCCAGCA CGCATGCCCA CAGGCCACGG CCTCCACATC 
251 ACAAACCCCC CACCAAGTTG CCCATCTATG GAGCAGCTCC CATACGGCAG 
301 GGTCAGGCTC TTACCTCCAC CTCCAGGGCA CAGACAGGGG GAGCTCTGTC 
351 TCACTGTAAG GCAATGAGGA GAGTTGAGGG CCCAGACCAG GCTAGGGGCC 
401 ATCCCCTTTC CCGAGCAGGC CTCAGGGAAG GACCAGCCCC ATTCCCATCT 
4 51 GACCTAGGTC TTAGCCCAGG AGCCTGCATA GGGAAGAAAG GACAGACAGG 
501 GCCTCCTTAC TGGCTGACAC TCAGGAGGGG CTGGGGCAAG AGAGCAGAGG 
551 GAGCGCAGGG CCAGGCAGGG GCTGCTGAGG ATCCATGGGA GCTCAGGGTG 
601 CACAAGGGGG CTGCCCTTCC TGGGCTGCAG GCAGCATCCC TATGGGAGCT 
651 GAGAAAGTCC AATCCTGAGA TGGGACAGTG CTGCCCAGGG GTGTGTGGCT 
701 GGGCCCTGAC AACAGTCTCC CCAAAAGTGA CCACATCACC AGGCTCAGTT 
751 CCAGGAAGGC TGAGAAGTGC CCAGTACACT GAGGATGCAC CTCAGTTACA 
801 TAAAATAAAT GAAACTGGAG TACTAACGTA CAGTTTAAAG GTTATAGTTA 
851 CTATTTTTAT ATGATATACT AGTAATTTTT GAATAGGGTA AACTTTAGGT 
901 GTTTTGACAC CAAAAGAAAA CTACATGAGT TCATGCATGT GTTAAATTGC 
951 TTTACTGTAG TAATCATTTA CATGTATATG TATATATGAA TATAATTATG 

1001 GGCTCATTAA ATTTAAATAT TATAAATAGG TGACAAAGAA TAAAGTTAAC 

1051 TGGAAAAAAA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 364 bp to 861 bp; peptide length: 166 
Category: putative protein 
Classification: no clue 

1 MRRVEGPDQA RGHPLSRAGL REGPAPFPSD LGLSPGACIG KKGQTGPPYW 
51 LTLRRGWGKR AEGAQGQAGA AEDPWELRVH KGAALPGLQA ASLWELRKSN 
101 PEMGQCCPGV CGWALTTVSP KVTTSPGSVP GRLRSAQYTE DAPQLHKINE 
151 TGVLTYSLKV IVTIFI 



BLAST P hits 

No BLAST P hits available 
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Alert BLASTP hits for DKFZphtes3_2119, frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_2119, frame 1 



Report for DKFZphtes3_2119. 1 



[LENGTH] 166 

[MW) 17691.35 

[plj 9.54 

[KW] All_Beta 

[KW] LOW_COMPLEXITY 7.23 % 



SEQ MRRV EG P DQARGH PL S PAG L REG PAP FP S DLG L S PGAC I GK KGQTG P P YWLT L RRGWG KR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccceeeeecccccc 

SEQ AEGAQGQAGAAEDPWELRVHKGAALPGLQAASLWELRKSNPEMGQCCPGVCGWALTTVSP 

SEG xxxxxxxxxxxx * 

PRD ccccccccccccccceeeeccccccccchhhhhhhhhhcccccccccccccceeeeeccc 

SEQ KVTTSPGSVPGRLRSAQYTEDAPQLHKINETGVLTYSLKVIVTIFI 

SEG 

PRD ccccccccccccccccccccccccceeeccccceeeehhhhhhccc 



(No Prosite data available for DKFZphtes3_2119 . 1) 
(No Pfam data available for DKFZphtes3_2119. 1) 
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DKFZphtes3_2ml8 



group: nucleic acid management 

DKFZphtes3_2ml8 encodes a novel amino acid protein, with similarity to mouse Dhml . 

The protein seems to play a role in nucleotide metabolism, RNA metabolism, but also in DNA 
repair and cell cycle. The yeast homologue is a DNA strand exchange protein required for 
sporulation and homologous recombination. 

The novel protein can find application as multifunctional nuclease / exoribonuclease. 
nearly identical to mouse Dhml 

complete cDNA, complete cds, start at Bp 42, EST hits 

Sequenced by EMBL 

Locus : unknown 

insert length: 3022 bp 

Poly A stretch at pos. 3004, polyadenylation signal at pos . 2981 

1 CTCGTCAGCC GGTCGGCCGC CGCCTCCAGC CGTGTGCCGC TATGGGAGTC 

51 CCGGCGTTCT TCCGCTGGCT CAGCCGCAAG TACCCGTCCA TCATAGTCAA 

101 CTGCGTGGAA GAGAAGCCAA AAGAATGCAA TGGTGTAAAG ATTCCAGTTG 

151 ATGCCAGTAA ACCTAATCCA AATGATGTGG AGTTTGATAA TCTGTATTTG 

201 GATATGAATG GAATCATCCA TCCCTGTACT CATCCTGAAG ACAAACCAGC 

251 ACCAAAAAAT GAAGATGAAA TGATGGTTGC AATTTTTGAG TACATTGACA 

301 GACTTTTCAG TATTGTAAGA CCAAGAAGAC TTCTCTACAT GGCAATAGAT 

351 GGAGTGGCAC CACGTGCTAA AATGAACCAG CAGCGTTCAA GGAGGTTCAG 

401 GGCATCAAAA GAAGGAATGG AAGCAGCAGT CGAGAAGCAG CGAGTCAGGG 

451 AAGAAATATT GGCAAAAGGT GGCTTTCTTC CTCCAGAAGA AATAAAAGAA 

501 AGATTTGACA GCAACTGTAT TACACCAGGA ACTGAATTCA TGGACAATCT 

551 TGCTAAATGC CTTCGCTATT ACATAGCTGA TCGTTTAAAT AATGACCCTG 

601 GGTGGAAAAA TTTGACAGTT ATTTTATCTG ATGCTAGTGC TCCTGGTGAA 

651 GGAGAACATA AAATCATGGA TTACATTAGA AGGCAAAGAG CCCAGCCTAA 

701 CCATGACCCA AATACTCATC ATTGTTTATG TGGAGCAGAT GCTGATCTCA 

751 TTATGCTTGG CCTTGCCACA CATGAACCGA ACTTTACCAT TATTAGAGAA 

801 GAATTCAAAC CAAACAAGCC CAAACCATGT GGTCTTTGTA ATCAGTTTGG 

851 ACATGAGGTC AAAGATTGTG AAGGTTTGCC AAGAGAAAAG AAGGGAAAGC 

901 ATGATGAACT TGCCGATAGT CTTCCTTGTG CAGAAGGAGA GTTTATCTTC 

951 CTTCGGCTTA ATGTTCTTCG TGAGTATTTG GAAAGAGAAC TCACAATGGC 

1001 CAGCCTACCA TTCACATTTG ATGTTGAGAG GAGCATTGAT GACTGGGTTT 

1051 TCATGTGCTT CTTTGTGGGA AATGACTTCC TCCCTCATTT GCCATCGTTA 

1101 GAGATTAGGG AAAATGCAAT TGACCGTTTG GTTAACATAT ACAAAAATGT 

1151 GGTACACAAA ACTGGGGGTT ACCTTACAGA AAGTGGTTAT GTCAATCTGC 

1201 AAAGAGTACA GATGATCATG TTAGCAGTTG GTGAAGTTGA GGATAGCATT 

1251 TTTAAAAAGA GAAAGGATGA TGAGGACAGT TTTAGAAGAC GACAGAAAGA 

1301 AAAAAGAAAG AGAATGAAGA GAGATCAACC AGCTTTCACT CCTAGTGGAA 

1351 TATTAACTCC TCATGCCTTG GGTTCAAGAA ATTCACCAGG TTCTCAAGTA 

1401 GCCAGTAATC CGAGACAAGC AGCCTATGAA ATGAGGATGC AGAATAACTC 

1451 TAGTCCTTCG ATATCTCCTA ATACGAGTTT CACATCTGAT GGCTCCCCGT 

1501 CTCCATTAGG AGGAATTAAG CGAAAAGCAG AAGACAGTGA CAGTGAACCT 

1551 GAGCCAGAGG ATAATGTCAG GTTATGGGAA GCTGGCTGGA AGCAGCGGTA 

1601 CTACAAGAAC AAATTTGATG TGGATGCAGC TGATGAGAAA TTCCGTCGGA 

1651 AAGTTGTGCA GTCGTACGTT GAAGGACTTT GCTGGGTTCT TAGATATTAT 

1701 TACCAGGGCT GTGCTTCCTG GAAGTGGTAT TATCCATTTC ATTATGCACC 

1751 ATTTGCTTCA GACTTTGAAG GCATTGCAGA CATGCCATCT GATTTTGAGA 

1801 AGGGTACGAA ACCGTTTAAA CCACTAGAAC AACTTATGGG GGTATTTCCA 

1851 GCTGCAAGTG GTAATTTTCT ACCTCCATCA TGGCGGAAGC TCATGAGTGA 

1901 TCCTGATTCT AGTATAATTG ACTTCTATCC TGAAGATTTT GCTATTGATT 

1951 TGAATGGGAA GAAATATGCA TGGCAAGGTG TTGCTCTCTT GCCATTCGTG 

2001 GATGAGCGAA GGCTACGAGC TGCCCTAGAA GAGGTATACC CAGACCTCAC 

2051 TCCAGAAGAG ACCAGAAGAA ACAGCCTTGG AGGTGATGTC TTATTTGTGG 

2101 GGAAACATCA CCCACTCCAT GACTTCATTT TAGAGCTGTA CCAGACAGGT 

2151 TCCACAGAGC CAGTGGAGGT ACCCCCTGAA CTATGTCATG GGATTCAAGG 

2201 AAAGTTTTCT TTGGATGAAG AAGCCATTCT TCCAGATCAA ATAGTATGTT 

2251 CTCCTGTTCC TATGTTAAGG GATCTGACAC AGAACACTGT AGTCAGTATT 

2301 AATTTTAAAG ACCCACAGTT TGCTGAAGAT TACATTTTTA AAGCTGTAAT 

2351 GCTTCCAGGA GCAAGAAAGC CAGCAGCAGT ACTGAAACCT AGTGACTGGG 

2401 AAAAATCCAG CAATGGACGG CAGTGGAAGC CTCAGCTTGG CTTTAACCGT 

2451 GACCGGAGGC CTGTGCACCT GGATCAGGCA GCCTTCAGGA CTTTGGGCCA 

2501 TGTGATGCCA AGAGGCTCAG GAACTGGCAT TTACAGCAAT GCTGCACCAC 

2551 CACCTGTGAC TTACCAGGGA AACTTATACA GGCCGCTTTT GAGAGGACAA 

2601 GCCCAGATTC CAAAACTTAT GTCAAATATG AGGCCCCAGG ATTCCTGGCG 

2651 AGGTCCTCCT CCCCTTTTCC AGCAGCAAAG GTTTGACAGA GGCGTTGGGG 
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2701 CTGAACCTCT GCTCCCATGG AACCGGATGC TGCAAACCCA GAATGCAGCC 
2 7 51 TTCCAGCCAA ACCAGTACCA GATGCTAGCT GGGCCTGGTG GGTATCCACC 
2801 CAGACGAGAT GATCGTGGAG GGAGACAGGG ATATCCCAGA GAAGGAAGGA 
2851 AATACCCTTT GCCACCACCC TCAGGAAGAT ACAATTGGAA TTAAGCTTTT 
2901 GTAAAGCTTT CCCAAATCCT TTCATCATTC TACAGTTTTA TGCTATTTGT 
2951 GGAAAGATTT CCTTCTCAAG TAGTAGTTTT TAATAAAACT ACAGTACTTT 
3001 GTGTAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



95192042: 

Characterization of cDNA encoding mouse homolog of fission yeast dhpl + 
gene: structural 

and functional conservation. 

97361754: 

Cloning and characterization of mouse Dhm2 cDNA, a functional homolog 
of budding yeast 
SEP1. 



Peptide information for frame 3 



ORF from 42 bp to 2891 bp; peptide length: 950 
Category: strong similarity to known protein 



1 MGVPAFFRWL SRKYPSIIVN 
51 LYLDMNGIIH PCTHPEDKPA 
101 AIDGVAPRAK MNQQRSRRFR 
151 IKERFDSNCI TPGTEFMDNL 
201 PGEGEHKIMD YIRRQRAQPN 
251 IREEFKPNKP KPCGLCNQFG 
301 FIFLRLNVLR EYLERELTMA 
351 PSLEIRENAI DRLVNI YKNV 
401 DSIFKKRKDD EDSFRRRQKE 
451 SQVASNPRQA AYEMRMQNNS 
501 SEPEPEDNVR LWEAGWKQRY 
551 RYYYQGCASW KWYYPFHYAP 
601 VFPAASGNFL PPSWRKLMSD 
651 PFVDERRLRA ALEEVYPDLT 
701 QTGSTEPVEV PPELCHGIQG 
751 VSINFKDPQF AEDYIFKAVM 
801 FNRDRRPVHL DQAAFRTLGH 
851 RGQAQI PKLM SNMRPQDSWR 
901 NAAFQPNQYQ MLAGPGGYPP 



CVEEKPKECN GVKIPVDASK PNPNDVEFDN 
PKNEDEMMVA IFEYIDRLFS IVRPRRLLYM 
ASKEGMEAAV EKQRVREEIL AKGGFLPPEE 
AKCLRYYIAD RLNNDPGWKN LTVILSDASA 
HDPNTHHCLC GADADLIMLG LATHE PN FT I 
HEVKDCEGLP REKKGKHDEL ADSLPCAEGE 
SLPFTFDVER SIDDWVFMCF FVGNDFLPHL 
VHKTGGYLTE SGYVNLQRVQ MIMLAVGEVE 
KRKRMKRDQP AFTPSGILTP HALGSRNSPG 
SPSISPNTSF TSDGSPSPLG GIKRKAEDSD 
YKNKFDVDAA DEKFRRKVVQ SYVEGLCWVL 
FASDFEGIAD MPSDFEKGTK PFKPLEQLMG 
PDSSIIDFYP EDFAIDLNGK KYAWQGVALL 
PEETRRNSLG GDVLFVGKHH PLHDFILELY 
KFSLDEEAIL PDQIVCSPVP MLRDLTQNTV 
LPGARKPAAV LKPSDWEKSS NGRQWKPQLG 
VMPRGSGTGI YSNAAPPPVT YQGNLYRPLL 
GPPPLFQQQR FDRGVGAEPL LPWNRMLQTQ 
RRDDRGGRQG YPREGRKYPL PPPSGRYNWN 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_2ml8, frame 3 

PIR: 149635 mouse Dhml protein - mouse, N * 1, Score = 4765, P - 0 

PIR:S43891 dhpl protein - fission yeast (Schizosaccharorayces pombe) , N 
- 3, Score = 1172, P - 2e-197 

PIR:S20126 exoribonuclease RATI (EC 3.1.11.-) - yeast (Saccharomyces 
cerevisiae), N - 2, Score - 1146, P - 3.8e-175 

PIR:S72531 exonuclease II - fission yeast {Schizosaccharomyces pombe), 
N - 4, Score - 622, P - 4.2e-125 



>PIR: 149635 mouse Dhml protein 
Length = 947 
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Score - 4765 (714.9 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities - 884/930 (95%), Positives = 895/930 (96%) 



MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 
MGVPAFFRWLSRKYPSI I VNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGI IH 

PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 
PCTHPEDKPAPKNEDEMMVAIFEYIDRLF+IVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 
PCTHPEDKPAPKNEDEMMVAIFEYIDRLFNIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 

ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
A K GMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
AIKGGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 
RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPN DPNTHHCLCGADADLIMLG 
RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNQDPNTHHCLCGADADLIMLG 

LATHEPNFTXIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
LATHEPNFTI IREEFKPNKPKPC LCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
LATHEPNFTI I REEFKPNKPKPCALCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 
FIFLRLNVLREYLERELTMASLPF FDVERS DDW FMCFFVGNDFLPHLPSLEIRE AI 
FIFLPXNVLREYLERELTMASLPFPFDVERSNDDWEFMCFFVGNDFLPHLPSLEIREGAI 

DRLVNIYKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 
DRLVNIYKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 
DRLVN I YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDS I FKKRKDDEDSFRRRQKE 

KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 
KRKRMKRDQPAFTPSGILTPHALGSRNSPG QVASNPRQAAYEMRMQ NSSPSISPNTSF 
KRKRMKRDQPAFTPSGILTPHALGSRNSPGCQVASNPRQAAYEMRMQRNSSPSISPNTSF 

TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 
SDGSPSPLGGI+RKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 
ASDGSPSPLGGIRRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 

SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 
SYVEGLCWVLRYYYQGCASWKW YPFHYAPFASDFEGIADM S+FEKGTKPFKPLEQLMG 
SYVEGLCWVLRYYYQGCASWKWLYPFHYAPFASDFEGIADMSSEFEKGTKPFKPLEQLMG 

VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 
VFPAASGNFLPP+WRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 
VFPAASGNFLPPTWRKLMSDPDS3IIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 
ALEEVYPDLTPEE RRNSLGGDVLFVGK HPL DFILELYQTGSTEPV+VPPELCHGIQG 
ALEEVYPDLTPEENRRNSLGGDVLFVGKLHPLRDFILELYQTGSTEPVDVPPELCHGIQG 

KFSLDEEAILPDQIVCSPVPMLRDLTQNTVVSINFKDPQFAEDYIFKAVMLPGARKPAAV 
FSLDEEAILPDQ VCSPVPMLRDLTQNT VSINFKDPQFAEDY+FKA ML PGA RK PA V 
TFSLDEEAILPDQTVCSPVPMLRDLTQNTAVSINFKDPQFAEDYVFKAAMLPGARKPATV 

LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 
LKP DWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHV PRGSGT +Y+N A P 
LKPGDWEKSS NGRQWK PQLG FN RDRR P VH L DQAA FRT LG H VT P RGSGT S VYT NTAL L P AN 

YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 
YQGN YRPLLRGQAQIPKLMSNMRP+DSWRGPPPLFQQ RF+R VGAEPLLPWNRM+Q Q 
YQGNNYRPLLRGQAQIPKLMSNMRPKDSWRGPPPLFQQHRFERSVGAEPLLPWNRMIQNQ 

N AA FQ PNQ YQML AG PGG Y P P RRD D - RGGRQ 929 
NAAFQPNQYQML GPGGYPPRRDD RGGRQ 



Query : 


1 


Sbjct : 


1 


Query : 


61 


Sbjct : 


61 




121 


Sbjct: 


121 


Query : 


181 


Sbjct : 


181 




241 


Sbjct : 


241 


Query : 


301 


Sbjct : 


301 




361 


Sbjct : 


361 




421 


Sbjct : 


421 


Query : 


481 


Sbjct : 


481 


Query : 


541 


Sbjct : 


541 




601 


Sbjct : 


601 


Query: 


661 


Sbjct: 


661 


Query: 


721 


Sbjct: 


721 


Query : 


781 


Sbjct: 


781 


Query: 


841 


Sbjct: 


841 


Query: 


901 


Sbjct: 


901 



Pedant information for DKFZphtes3_2ml8, frame 3 



Report for DKFZphtes3_2ml8 . 3 



[LENGTH] 950 

(MW] 108582.68 

(pi] 7.26 

[HOMOL] PIR: 149635 mouse Dhml protein - mouse 0.0 

[FUNCATJ 08.01 nuclear transport (S. cerevisiae, YOR048c] le-123 

[FUNCAT] 04.01.04 rrna processing [S. cerevisiae, YOR048c] le-123 
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[FUNCATJ 30.10 nuclear organization [S. cerevisiae, YOR04Bc] le-123 

[FUNCATJ 01.03.16 polynucleotide degradation (S. cerevisiae, YGL173c] 3e-79 

(FUNCAT) 30.03 organization of cytoplasm (S. cerevisiae, YGL173c} 3e-79 

[FUNCATJ 03.22 cell cycle control and mitosis (S. cerevisiae, YGL173c] 3e-79 

[PIRKW) nucleus le-126 

IPIRKW] hydrolase le-122 

[ PIRKW] exoribonuclease le-122 

[PROSITEJ MYRISTYL 7 

[PROSITE] AMIDATION 2 

(PROSITEJ CAMP_PHOSPHO_SITE 1 

[PROSITEJ CK2_PHOSPHO_SITE 12 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITEJ GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] A S N_GL YCOS Y LAT I ON 4 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 6.21 % 

SEQ MGVPAFFRWLSRKYPSI IVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGI IH 

SEG 

PRD cccchhhhhhhhhcceeeeeeecccccccccccccccccccccccccccceeeeccceee 
MEM 

SEQ PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhcceeeeeeeccccchhhhhhhhhhhhh 



MEM 



MEM 



MEM 



SEQ ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhh 



SEQ RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 

SEG 

PRD hcccccccceeeeeeeccccccccchhhhhhhhhhhhccccccccccccccccccceeec 

MEM 

SEQ LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

SEG 

PRD ccccccccccccccccccccccceeeccccccccccccccchhhhhhhhhcccccccccc 

MEM 

SEQ FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhchhhhhhhhhhhheeeeeeccccccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 

SEG xxxxxx 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 

SEG xxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccchhhhhhhhhhhhhhhccccccccccccc 



SEQ TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 

SEG XX XXXXXXXXXXX 

PRD ccccccccchhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 

SEQ SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 

SEG 

PRD hhhhhhheeeeeeccccccccccccccccccccccccccccccccccccccccchhhhhh 

MEM 

SEQ VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

SEG 

PRD hccccccccccccccccccccccceeeccccceeeccccccceeeeeeeeeccchhhhhh 

MEM 

SEQ ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 

SEG 

PRD hhhhhccccchhhhhhcccccceeeeeecccchhhhhhhhhcccccceeecccccccccc 

MEM 

SEQ KFSLDEEAILPDQIVCSPVPMLRDLTQNTVVSINFKDPQFAEDYIFKAVMLPGARKPAAV 

SEG 
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PRD cccccceeecccceeeccccccccccccceeeeecccccchhhhheeeccccccccccee 

MEM 

SEQ LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 

SEG 

PRD eccccccccccccccccccccccccccccchhhhhhhhhhcccccccccccccccccccc 

MEM 

SEQ YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 

SEG 

PRD cccccchhhhhcccchhhhhcccccccccccccccchhhhhccccccccccccchhhhhh 

MEM 

SEQ NAAFQPNQYQMLAGPGGYPPRRDDRGGRQGYPREGRKYPLPPPSGRYNWN 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccccceeecccccccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_2ml8 . 3 



PS00001 


190- 


>194 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


247- 


■>251 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


468- 


>472 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


477- 


■>481 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


826- 


>830 


GL YCOS AMI NOGLYCAN 


PDOC00002 


PS00004 


675- 


>679 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


11 


->14 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116- 


■>119 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


413- 


>416 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


559- 


>562 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


613- 


>616 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


674- 


>677 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


868- 


>871 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


944- 


>947 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


63 


;->67 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


331- 


>335 


CK2 PHOSPHO~SITE 


PDOC00006 


PSOO0O6 


499- 


>503 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


501- 


>505 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


541- 


>545 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


573- 


>577 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


583- 


>5B7 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


619- 


>623 


CK2 PHOSPHORS I TE 


PDOC00006 


PS00006 


624- 


>628 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


670- 


>674 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


723- 


>727 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


784- 


>788 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


659- 


>667 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


125- 


>131 


MYRISTYL 


PDOC00008 


PS00008 


375- 


>381 


MYRISTYL 


PDOC00008 


PS00008 


450- 


>456 


MYRISTYL 


PDOC00008 


PS00008 


600- 


>606 


MYRISTYL 


PDOC0.0008 


PS000O8 


825- 


>831 


MYRISTYL 


PDOC00008 


PS000O8 


829- 


>835 


MYRISTYL 


PDOC00008 


PS00008 


926- 


>932 


MYRISTYL 


PDOC00008 


PS00009 


638- 


>642 


AMI DAT I ON 


PDOC00009 


PS000O9 


934- 


>938 


AMI DAT I ON 


PDOC000O9 



(No Pfara data available for DKFZphtes3_2ml8 . 3) 



811 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_2m20 



group: testes derived 

DKFZphtes3_2m20 encodes a novel 183 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



group: unknown 

DKFZphtes3_2m20 encodes a novel 

amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

EST hits are only from testis or uterus librarys 
remaining intron in3 • UTR see EST-BLAST 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1341 bp 

Poly A stretch at pos . 1320, polyadenylation signal at pos. 1300 



1 GCAATCCAGG AGCTGAATGG TAACTCTTCC ACAAGCGAAA ACTGTTCGTG 
51 AATACAAGCA AAAGGCCCCC CAAGAGGACC CCTGATATGA TCCAGCAGCC 
101 TCGGGCCCCG CTGGTGTTGG AGAAGGCTTC TGGTGAAGGA TTTGGCAAAA 
151 CCGCCGCTAT TATACAGCTC GCTCCTAAAG CTCCTGTTGA CCTGTGTGAG 
201 ACAGAGAAAC TGAGGGCAGC CTTCTTTGCA GTCCCGTTGG AAATGAGAGG 
251 GTCCTTCCTG GTGCTGCTCC TGAGGGAATG CTTCCGAGAC CTGAGCTGGC 
301 TGGCACTCAT CCATAGCGTC CGTGGGGAGG CGGGGCTGCT GGTGACGAGT 
351 ATCGTCCCGA AGACCCCGTT TTTCTGGGCC ATGCACATCA CTGAGGCTCT 
401 GCACCAGAAC ATGCAGGCTC TGTTTAGCAC CCTGGCTCAG GCGGAGGAGC 
451 AGCAGCCCTA CCTGGAGGCT CCACCGTTAT GCGCGGGACT CGCTGTCTGG 
501 CAGAGTACCA CCTGGGGGAT TATGGACACG CCTGGAACAG GTGTTGGGTG 
5S1 CTGGACAGGG TGGACACCTG GGCTGTGGTC ATGTTCATTG ATTTTGGACA 
601 GTTGGCCACC ATCCCTGTGC AGTCTCTGCG CCAGCTAGAC AGCGACGACT 
651 TCTGGACCAT CCCACCCCTG ACTCAGCCAT TCATGCTGGA GAAAGACATT 
701 TTGAGTTCGT ATGAGGTTGT CCATCGAATC CTCAAAGGGA AAATCACTGG 
751 TGCTTTGAAC TCGGCGGTAA CTGCTCCTGC ATCTAACTTG GCTGTTGTCC 
801 CTCCACTCCT GCCCTTGGGG TGTCTGCAGC AGGCTGCTGC CTAGGCCTGG 
851 ACACATTGCA CATCCTAAAG TTTGAAGAGT CTAAATAACG GGGCTTCCCT 
901 CAGCATGTTC CCTCTCCTGT TTGCCACGGA TCCAGAGCCA CCTGCCCTGT 
951 CTTCTCGTAC CCCTTTCACT CTTGAGGCCT GGGAGGTGAA AAAGGCCAGA 
1001 CTGTGCCCAG GATTGATTCA ATTTTGCTTT TACTCCCAGC TTCCCTCTCA 
1051 AAAGAGAGTG AAGTCTCATT TGTCATGTGT CTTCAGTTCC CCAACTTGGC 
1101 ATGAACATTT GAACCAAACA TAGGAAACTA CC ATT AGGTT GAAAGCCTGA 
1151 GGCAGCTGGG ATGGTCTTTC TTGTGTCTCT TCTTTGCACC CCAGAGCATG 
1201 ATATAAGTGG TCCTAACAGA TTCTGGATAA TGGAGAAGCC CTCTGCTGGT 
1251 TTTCCTGGCA TTCCATGTAG AATAGGTAGA GAATATTTAA CCAATGAGCA 
1301 AATAAATGTT GGCATGTTTC ATGAAAAAAA AAAAAAAAAA A 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 479 bp to 841 bp; peptide length: 121 
Category: questionable ORF 
Classification: no clue 

1 MRGTRCLAEY HLGDYGHAWN RCWVLDRVDT WAVVMFIDFG QLATIPVQSL 
51 RQLDSDDFWT IPPLTQPFML EKDILSSYEV VHRILKGKIT GALNSAVTAP 
101 ASNLAVVPPL LPLGCLQQAA A 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 87 bp to 635 bp; peptide length: 183 
Category: putative protein 
Classification: no clue 

1 MIQQPRAPLV LEKASGEGFG KTAAIIQLAP KAPVDLCETE KLRAAFFAVP 

51 LEMRGSFLVL LLRECFRDLS WLALIHSVRG EAGLLVTSIV PKTPFFWAMH 

101 ITEALHQNMQ ALFSTLAQAE EQQPYLEAPP LCAGLAVWQS TTWGIMDTPG 

151 TGVGCWTGWT PGLWSCSLIL DSWPPSLCSL CAS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2m20, frame 2 

Report for DKFZphtes3_2m20 . 2 

[LENGTH} 121 

[MWJ 13436.69 

[pli 5.81 

[KW] Alpha_Beta 

SEQ MRGTRCLAEYHLGDYGHAWNRCWVLDRVDTWAVVMFIDFGQLATIPVQSLRQLDSDDFWT 
PRD ccchhhhhcccccccccccceeeecccccccceeeeeecccccccccccccccccccccc 

SEQ IPPLTQPFMLEKDILSSYEVVHRILKGKITGALNSAVTAPASNLAVVPPLLPLGCLQQAA 
PRD cccccchhhhhhhcchhhhhhhhhhcccccchhhhhhcccccceeeeccccccccccccc 

SEQ A 
PRD C 

(No Prosite data available for DKFZphtes3_2m20 . 2 ) 
(No Pfam data available for DKFZphtes3_2m20 . 2 ) 

Pedant information for DKFZphtes3_2m20, frame 3 

Report for DKFZphtes3_2m20 . 3 

[LENGTH] 183 

[MW] 19971.49 

[pi] 5.31 

[KW] Alpha_Beta 
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SEQ MIQQPRAPLVLEKASGEGFGKTAAI IQLAPKAPVDLCE^EKLRAAFFAVPLEMRGSFLVL 

PRD ccccccccceeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhcchhhhhh 

SEQ LLRECFRDLSWLALIHSVRGEAGLLVTSIVPKTPFFWAMHITEALHQNMQALFSTLAQAE 

PRD hhhhhhcchhhhhhhhhhcccceeeeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EQQPYLEAPPLCAGLAVWQSTTWGIMDTPGTGVGCWTGWTPGLWSCSLILDSWPPSLCSL 

PRD hhhcccccccccccceeeecccceeecccccccccccccccccccceeeeccccccceee 

SEQ CAS 

PRD CCC 



(No Prosite data available for DKFZphtes3_2m20 . 3> 
(No Pfam data available for DKFZphtes3_2m20 . 3 ) 
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DKFZphtes3_2n9 



group: testes derived 

DKF2phtes3_2n9 encodes a novel 184 amino acid protein with very weak similarity to Homo 
sapiens PAC clone DJ0771P04 from 7qll . 21-qll . 23 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes - 



unknown 

on genomic level encoded by HS1186N24, no splice pattern but EST 
matches 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1000 bp 

Poly A stretch at pos . 988, polyadenylation signal at pos. 970 



1 CAACTTTTTA AAGATGTGAA TTGGACAGCC AGACTTGCTT ATTTGTCTGA 
51 TATCTTCAGT ATTTTTTAAT GATCTTAATG CTTCTATGCA AGGGAAGAAT 
101 GCAACTTATT TTTCAATGGC AGATAAAGTT GAAGGACAAA AACAGAAGTT 
151 AGAAGCTTGG AAAAACAGAA TTTCTACAGA TTGTTATGAC ATGTTTCATA 
201 ATTTAACAAC AATTATCAAT GAAGTAGGTA ATGATCTTGA TATTGCACAT 
251 CTGCGAAAAG TTATCAGTGA ACATCTTACA AATTTGTTAG AATGTTTTGA 
301 ATTTTATTTT CCATCAAAAG AAGATCCACG CATAGGAAAT TTGTGGATCC 
351 AAAATCCATT TCTTTCATCA AAAGATAACT TAAATTTAAC TGTAACTCTA 
401 CAGGATAAGT TGTTGAAGCT GGCTACCGAC GAAGGATTGA AAATCAGTTT 
4 51 TGAAAATACA GCATCACTTC CTTCATTTTG GATAAAAGCT AAAAATGACT 
501 ATCCTGAGCT TGCTGAGATT GCTTTAAAAT TGCTGCTTCT TTTCCCCTCA 
551 ACATACCTCT GTGAGACCGG ATTCTCTACT TTAAGTGTTA TTAAAACAAA 
601 ACATAGAAAC AGTTTAAATA TACATTATCC CCTGAGGTAG CATTGTCATC 
651 AATCCAACCT AGATTAGACA AATTAACAAG CAAGAAGCAA GCTCACTTAT 
701 CACATTAAAA GCTTTAAATA TTGATATGTA AGGTATTGGT TCAAAGTATG 
751 CATATAAGCA TTGAGTGTGA GGAATTTGCT ATTTCACTTT AAACTTTCTG 
801 TCTAGTTACA GTTATGGAAG TATGAGAAGT TATGAGTGAA ACAGCAATTT 
851 TCTATATAAA TTGCCTATAT GTATATTTTC AATTAAGAAT GTGTACAGTT 
901 TTTATAATTC TATTTTTCCT CATATTTGTC GTATTTATTA AAATATAATT 
951 TTAAATCTGT TGATTCTAAT ATTAAAACAT TTGATCTTAA AAAAAAAAAA 



BLAST Results 



Entry HS118 6N24 from database EMBLNEW; 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 1186N24 
Score - 4921, P - 5.8e-215, identities » 989/992 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 86 bp to 637 bp; peptide length: 184 
Category: similarity to unknown protein 
Classification: no clue 

1 MQGKNATYFS MADKVEGQKQ KLEAWKNRIS TDCYDMFHNL TTIINEVGND 
51 LDIAHLRKVI SEHLTNLLEC FEFYFPSKED PRIGNLWIQN PFLSSKDNLN 
101 LTVTLQDKLL KLATDEGLKI SFENTASLPS FWIKAKNDYP ELAEIALKLL 
151 LLFPSTYLCE TGFSTLSVIK TKHRNSLNIH YPLR 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2n9, frame 2 

TREMBLNEW:AC004883_3 gene: M WUGSC:H_DJ0771P04 .2"; Homo sapiens PAC 
clone DJ0771P04 from 7qll .21-qll .23, complete sequence., N - 1, Score » 
94, P - 0.042 



>TREMBLNEW;AC004863 3 gene: "WUGSC : H_DJ0771P04 . 2"; Homo sapiens PAC clone 
DJ0771P04 from~7qll .21-qll. 23, complete sequence. 
Length * 533 

HSPs: 

Score » 94 (14.1 bits), Expect - 4.3e-02, P = 4.2e-02 
Identities - 39/177 {22%), Positives = 75/177 (42%) 

Query: 1 MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLD-IAHLRKV 59 

+QG + M D + KL W+ ++ + F L + L+ I + + + 

Sbjct: 354 LQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAEL 413 

Query: 60 ISEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLK 119 

+E L + F+ Y + + + +PF + D+++ LQ +++ L + LK 

Sbjct: 414 KTEFQKRLSD-FKLY ESELTL FSSPFSTKIDSVH--EELQMEVIDLQCNTVLK 463 

Query: 120 I SFENTASLPS FWI KAKNDYPXXXXXXXXXXXXFPST YLCETGFSTLS V I KTKHRNSL 177 

++ +p F+ YP F STY+CE FS + + KTK+ + L 

Sbjct: 464 TKYDKVG-IPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQL 520 

Pedant information for DKFZphtes3_2n9, frame 2 

Report for DKFZphtes3_2n9.2 

(LENGTH] 184 

(MWJ 21203.53 

[pi] 6.52 

[KW] Alpha_Beta 

[KWJ LOW_COMPLEXITY 6.52 % 

SEQ MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLDIAHLRKVI 

SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhcccceeecccccccchhhhhhhhh 

SEQ SEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLKI 

SEG 

PRD hhhhhhhhhhhhcccccccccccceeeeccccccccccceeeeehhhhhhhhhhhcccee 

SEQ SFENTASLPSFWIKAKNDYPELAEIALKLLLLFPSTYLCETGFSTLSVIKTKHRNSLNIH 
SEG XXXXXXXXXXXX 

PRD eecccccccceeeeecccchhhhhhhhhhhhhcccccccccccceeeeeecccccceeec 

SEQ YPLR 

SEG 

PRD CCCC 

(No Prosite data available for DKFZphtes3_2n9 . 2 ) 
(No Pfam data available for DKFZphtes3_2n9 . 2) 
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DKFZphtes3_30f4 



group: testes derived 

DKFZphtes3_30f4 encodes a novel 192 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown 

Sequenced by LMU 

Locus: /map="717.2-8 cR from top of Chr8 linkage group- 
Insert length: 1388 bp 

Poly A stretch at pos . 1330, polyadenylation signal at pos . 1310 



1 CACTGAGCCC TCCTCAGATG GTTAGTGGCT TCCAACAGCC ATCAGGAGTG 

51 TTTCTTGAAT GCCCCAGGTG TGGAGGACTT GGTCTGTGAC CACCTAGAAC 

101 CCCAGAGCTG AACAGGAAGC CGTCCCTGCA GCAACAAGAG GGCTGGAAGG 

151 GGGAGCTGCA GGCCACCCTC GGCTCTCCCA CTGCTGGGGC GGTGATGTTC 

201 GGGTGACATG TTTGAAAAAT ACTCTTAAAG ATACCAACTG TTCCCTTATA 

251 TGGCTAATGG TTTGTGCAGC CACCAGCGAT GGCGGCCCCT ATTAGAGACC 

301 AGGTTTGTTA AAACACCAAA TATTGCTGTC CACACTAGAC ATTAACCGGC 

351 TTCAGAAAAG ATGGACACCT TTTCCCACGC TGTTTCGCTT CTTAACTTTG 

401 GTCCAGCTTT AGCCACCACA CAGCGTGTGA GGGACTGCTG CTGCGGAGTC 

451 AGCCTCGTTT GTCCCTCCGC CTCCCACCAG CATGCGCCGC TTCTGAGAGA 

501 CACCAGCTCC CTGCCTCCAA GCCTGGTGCC ACAGGCCTGT CGTGAGGGAC 

551 CCCTGCTTCC GAGAGCTCCT GGGGGGGTTC TGCCCTTCAC CACCTGGGAG 

601 AGGTGTCAGT TCAGTTCCGA GTTGAACAAG GCCCGTGCAC ACAGCATGTT 

651 GGGGGCCCAG CCCAAAGTTC TTGTCACCTC CTCATGCAAA GCCAGCCATC 

701 ACCCTCCGGC CAGAGCTCAA GGTGGCCCCT TGGCCAGCCC CTCCTTGGGT 

7 51 CCTCCAGGAG GACTGAGCAC CCCTCCTAGC GGCATCCCTT GCCCTCCACA 

801 GTGCTGCCAG GGGCACGTCG CTCTGTGCCG TGGACTGAGA CCATCCCCTG 

851 GTGACAGAAT GACCCGTTTG TTGGAAATGC CTCGTTGCCA GAGAAACTCC 

901 CCAGGCATCT CGGAACGAAA CTATTTAGTT CCATTGTGAA CTGGCCACGG 

951 GACAGCTTTT TATCAACTTA TTAAGTTGGA GCACTGTAAT CGCGCTTGCT 

1001 GAGTTAGCAG TGGTGGTAAG CGTGTGTTAA ACACATAATG TTACGTTTTA 

1051 GGAGAGAGAG GTCGTAAGGA AGTGTCGTGT CGCTCATGAC TCTCTTCTAT 

1101 TAGTTGGGTA ACAGTGGCCT CATGTTTGTG TCTGTGTGTA CACAGAGCCC 

1151' TTAGGTTCTG CTCTGTTTCT TTGCCAGGTG AATGTTTGTG GCATGCGCTG 

1201 CTGTCCGCGC CCCTCTGTCC TGCGCAGGGT TCAGCTGTGC GGCGCCCTGA 

1251 TTTCCTCCAT GCACACAGAA CCTCCTTGTG TCTGTTTCTC TGTTCCTCTG 

1301 TGGCTGACTC AATAAACTTT TCCCTCTGAC ATGAAAAAAA AAAAAAAAAA 

1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



Entry HS548358 from database EMBL: 
human STS EST67250. 
Score - 2126, P = l.Se-89, identities = 444/472 

Entry HS670351 from database EMBL: 
human STS WI-18501. 
Score = 2089, P - 7.1e-88, identities = 445/476 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 361 bp to 936 bp; peptide length: 192 
Category: putative protein 
Classification: no clue 
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1 MDTFSHAVSL LNFGPALATT QRVRDCCCGV SLVCPSASHQ HAPLLRDTSS 

51 LPPSLVPQAC REGPLLPRAP GGVLPFTTWE RCQFSSELNK ARAHSMLGAQ 

101 PKVLVTSSCK ASHHPPARAQ GGPLASPSLG PPGGLSTPPS GIPCPPQCCQ 

151 GHVALCRGLR PSPGDRMTRL LEMPRCQRNS PGISERNYLV PL 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30f4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_30f 4, frame 1 

Report for DKFZphtes3_30f 4 . 1 

[LENGTH] 192 

(MW] 20281.56 

[pi] 9.21 

[ BLOCKS] BL01013C Oxysterol-binding protein family proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXIT¥ 10.94 % 

SEQ MDTFSHAVSLLNFGPALATTQRVRDCCCGVSLVCPSASHQHAPLLRDTSSLPPSLVPQAC 

SEG 

PRD ccchhhhheeecccccchhhhhhhhcccceeeeccccccccccccccccccccccccccc 

SEQ REGPLLPRAPGGVLPFTTWERCQFSSELNKARAHSMLGAQPKVLVTSSCKASHHPPARAQ 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhccccceeeeecccccccccccccc 

SEQ GGPLASPSLGPPGGLSTPPSGIPCPPQCCQGHVALCRGLRPSPGDRMTRLLEMPRCQRNS 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccchhhhhhhhcccccccchhhhhccccccccc 

SEQ PGISERNYLVPL 

SEG 

PRD cccccccccccc 

(No Prosite data available for DKFZphtes3_30f4 . 1 ) 
(No Pfam data available for DKFZphtes3_30f 4 . 1 ) 
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DKFZphtes3_35b4 



group: cell cycle 

DKFZphtes3 35b4 encodes a novel 1780 amino acid protein which is C-terminal identical to human 
M-phase phosphoprotein-1 (MPPl) . 

The novel protein contains a N-terminal Pfara kinesin motor domain and a ATP/GTP-binding site 
motif A (P-loop) . MPPl is expressed and phosphorylated in the metaphase. Therefore the novel 
protein seems to be involved in the mitotic spindle during cell division. 

The new protein can find application in modulation of the mitotic spindle. 



"M-phase phosphoprotein-1" extension 



motor protein 



Sequenced by DKFZ 

Locus: /map="7 50_H_l; 758_H_7; 759_C_9; 847_D_4; 906_D_1; 931_D_3; 944_C_1; 750_G_12; 
800_A_U; 512.1 cR from top~of ChrlO linkage group" 

Insert length: 6284 bp 

No poly A stretch found, no polyadenylation signal found 



1 ATCGCAGTGC TGCTCGCGGG TCTGGCTAGT CAGGCGAAGT TTGCAGAATG 
51 GAATCTAATT TTAATCAAGA GGGAGTACCT CGACCATCTT ATGTTTTTAG 
101 TGCTGACCCA ATTGCAAGGC CTTCAGAAAT AAATTTCGAT GGCATTAAGC 
151 TTGATCTGTC TCATGAATTT TCCTTAGTTG CTCCAAATAC TGAGGCAAAC 
201 AGTTTCGAAT CTAAAGATTA TCTCCAGGTT TGTCTTCGAA TAAGACCATT 
251 TACACAGTCA GAAAAAGAAC TTGAGTCTGA GGGCTGTGTG CATATTCTGG 
301 ATTCACAGAC TGTTGTGCTG AAAGAGCCTC AATGCATCCT TGGTCGGTTA 
351 AGTGAAAAAA GCTCAGGGCA GATGGCACAG AAATTCAGTT TTTCCAAGGT 
401 TTTTGGCCCA GCAACTACAC AGAAGGAATT CTTTCAGGGT TGCATTATGC 
4 51 AACCAGTAAA AGACCTCTTG AAAGGACAGA GTCGTCTGAT TTTTACTTAC 
501 GGGCTAACCA ATTCAGGAAA AACATATACA TTTCAAGGGA CAGAAGAAAA 
551 TATTGGCATT CTGCCTCGAA CTTTGAATGT ATTATTTGAT AGTCTTCAAG 
601 AAAGACTGTA TACAAAGATG AACCTTAAAC CACATAGATC CAGAGAATAC 
651 TTAAGGTTAT CATCAGAACA AGAGAAAGAA GAAATTGCTA GCAAAAGTGC 
701 ATTGCTTCGG CAAATTAAAG AGGTTACTGT GCATAATGAT AGTGATGATA 
751 CTCTTTATGG AAGTTTAACT AACTCTTTGA ATATCTCAGA GTTTGAAGAA 
601 TCCATAAAAG ATTATGAACA AGCCAACTTG AATATGGCTA ATAGTATAAA 
851 ATTTTCTGTG TGGGTTTCTT TCTTTGAAAT TTACAATGAA TATATTTATG 
901 ACTTATTTGT TCCTGTATCA TCTAAATTCC AAAAGAGAAA GATGCTGCGC 
951 CTTTCCCAAG ACGTAAAGGG CTATTCTTTT ATAAAAGATC TACAATGGAT 
1001 TCAAGTATCT GATTCCAAAG AAGCCTATAG ACTTTTAAAA CTAGGAATAA 
1051 AGCACCAGAG TGTTGCCTTC ACAAAATTGA ATAATGCTTC CAGTAGAAGT 
1101 CACAGCATAT TCACTGTTAA AATATTACAG ATTGAAGATT CTGAAATGTC 
1151 TCGTGTAATT CGAGTCAGTG AATTATCTTT ATGTGATCTT GCTGGTTCAG 
1201 AACGAACTAT GAAGACACAG AATGAAGGTG AAAGGTTAAG AGAGACTGGG 
1251 AATATCAACA CTTCTTTATT GACTCTGGGA AAGTGTATTA ACGTCTTGAA 
1301 GAATAGTGAA AAGTCAAAGT TTCAACAGCA TGTGCCTTTC CGGGAAAGTA 
1351 AACTGACTCA CTATTTTCAA AGTTTTTTTA ATGGTAAAGG GAAAATTTGT 
1401 ATGATTGTCA ATATCAGCCA ATGTTATTTA GCCTATGATG AAACACTCAA 
14 51 TGTATTGAAG TTCTCCGCCA TTGCACAAAA AGTTTGTGTC CCAGACACTT 
1501 TAAATTCCTC TCAAGATAAA TTATTTGGAC CTGTCAAATC TTCTCAAGAT 
1551 GTATCACTAG ACAGTAATTC AAACAGTAAA ATATTAAATG TAAAAAGAGC 
1601 CACCATTTCA TGGGAAAATA GTCTAGAAGA TTTGATGGAA GACGAGGATT 
1651 TGGTTGAGGA GCTAGAAAAC GCTGAAGAAA CTCAAAATGT GGAAACTAAA 
1701 CTTCTTGATG AAGATCTAGA TAAAACATTA GAGGAAAATA AGGCTTTCAT 
1751 TAGCCACGAG GAGAAAAGAA AACTGTTGGA CTTAATAGAA GACTTGAAAA 
1801 AAAAACTGAT AAATGAAAAA ■AAGGAAAAAT TAACCTTGGA ATTTAAAATT 
1851 CGAGAAGAAG TTACACAGGA GTTTACTCAG TATTGGGCTC AACGGGAAGC 
1901 TGACTTTAAG GAGACTCTGC TTCAAGAACG AGAGATATTA GAAGAAAATG 
1951 CTGAACGTCG TTTGGCTATC TTCAAGGATT TGGTTGGTAA ATGTGACACT 
2001 CGAGAAGAAG CAGCGAAAGA CATTTGTGCC ACAAAAGTTG AAACTGAAGA 
2051 AGCTACTGCT TGTTTAGAAC TAAAGTTTAA TCAAATTAAA GCTGAATTAG 
2101 CTAAAACCAA AGGAGAATTA ATCAAAACCA AAGAAGAGTT AAAAAAGAGA 
2151 GAAAATGAAT CAGATTCATT GATTCAAGAG CTTGAGACAT CTAATAAGAA 
2201 AATAATTACA CAGAATCAAA GAATTAAAGA ATTGATAAAT ATAATTGATC 
2251 AAAAAGAAGA TACTATCAAC GAATTTCAGA ACCTAAAGTC TCATATGGAA 
2301 AACACATTTA AATGCAATGA CAAGGCTGAT ACATCTTCTT TAATAATAAA 
2351 CAATAAATTG ATTTGTAATG AAACAGTTGA AGTACCTAAG GACAGCAAAT 
2401 CTAAAATCTG TTCAGAAAGA AAAAGAGTAA ATGAAAATGA ACTTCAGCAA 
24 51 GATGAACCAC CAGCAAAGAA AGGGTCTATC CATGTTAGTT CAGCTATCAC 
2501 TGAAGACCAA AAGAAAAGTG AAGAAGTGCG ACCGAACATT GCAGAAATTG 
2551 AAGACATCAG AGTTTTACAA GAAAATAATG AAGGACTGAG AGCATTTTTA 
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2601 CTCACTATTG AGAATGAACT TAAAAATGAA AAGGAAGAAA AAGCAGAATT 
2651 AAATAAACAG ATTGTTCATT TTCAGCAGGA ACTTTCTCTT TCTGAAAAAA 
2701 AGAATTTAAC TTTAAGTAAA GAGGTCCAAC AAATTCAGTC AAATTATGAT 
2751 ATTGCAATTG CTGAATTACA TGTGCAGAAA AGTAAAAATC AAGAACAGGA 
2801 GGAAAAGATC ATGAAATTGT CAAATGAGAT AGAAACTGCT ACAAGAAGCA 
2851 TTACAAATAA TGTTTCACAA ATAAAATTAA TGCACACGAA AATAGACGAA 
2901 CTACGTACTC TTGATTCAGT TTCTCAGATT TCAAACATAG ATTTGCTCAA 
2951 TCTCAGGGAT CTGTCAAATG GTTCTGAGGA GGATAATTTG CCAAATACAC 
3001 AGTTAGACCT TTTAGGTAAT GATTATTTGG TAAGTAAGCA AGTTAAAGAA 
3051 TATCGAATTC AAGAACCCAA TAGGGAAAAT TCTTTCCACT CTAGTATTGA 
3101 AGCTATTTGG GAAGAATGTA AAGAGATTGT GAAGGCCTCT TCCAAAAAAA 
3151 GTCATCAGAT TGAGGAACTG GAACAACAAA TTGAAAAATT GCAGGCAGAA 
3201 GTAAAAGGCT ATAAGGATGA AAACAATAGA CTAAAGGAGA AGGAGCATAA 
3251 AAACCAAGAT GACCTACTAA AAGAAAAAGA AACTCTTATA CAGCAGCTGA 
3301 AAGAAGAATT GCAAGAAAAA AATGTTACTC TTGATGTTCA AATACAGCAT 
3351 GTAGTTGAAG GAAAGAGAGC GCTTTCAGAA CTTACACAAG GTGTTACTTG 
3401 CTATAAGGCA AAAATAAAGG AACTTGAAAC AATTTTAGAG ACTCAGAAAG 
3451 TTGAACGTAG TCATTCAGCC AAGTTAGAAC AAGACATTTT GGAAAAGGAA 
3501 TCTATCATCT TAAAGCTAGA AAGAAATTTG AAGGAATTTC AAGAACATCT 
3551 TCAGGATTCT GTCAAAAACA CCAAAGATTT AAATGTAAAG GAACTCAAGC 
3601 TGAAAGAAGA AATCACACAG TTAACAAATA ATTTGCAAGA TATGAAACAT 
3651 TTACTTCAAT TAAAAGAAGA AGAAGAAGAA ACCAACAGGC AAGAAACAGA 
3701 AAAATTGAAA GAGGAACTCT CTGCAAGCTC TGCTCGTACC CAGAATCTGA 
37 51 AAGCAGATCT TCAGAGGAAG GAAGAAGATT ATGCTGACCT GAAAGAGAAA 
3801 CTGACTGATG CCAAAAAGCA GATTAAGCAA GTACAGAAAG AGGTATCTGT 
3851 AATGCGTGAT GAGGATAAAT TACTGAGGAT TAAAATTAAT GAACTGGAGA 
3901 AAAAGAAAAA CCAGTGTTCT CAGGAATTAG ATATGAAGCA GCGAACCATT 
3951 CAGCAACTCA AGGAGCAGTT AAATAATCAG AAAGTGGAAG AAGCTATACA 
4001 ACAGTATGAG AGAGCATGCA AAGATCTAAA TGTTAAAGAG AAAATAATTG 
4051 AAGACATGCG AATGACACTA GAAGAACAGG AACAAACTCA GGTAGAACAG 
4101 GATCAAGTGC TTGAGGCTAA ATTAGAGGAA GTTGAAAGGC TGGCCACAGA 
4151 ATTGGAAAAA TGGAAGGAAA AATGCAATGA TTTGGAAACC AAAAACAATC 
4 201 AAAGGTCAAA TAAAGAACAT GAGAACAACA CAGATGTGCT TGGAAAGCTC 
4251 ACTAATCTTC AAGATGAGTT ACAGGAGTCT GAACAGAAAT ATAATGCTGA 
4 301 TAGAAAGAAA TGGTTAGAAG AAAAAATGAT GCTTATCACT CAAGCGAAAG 
4 351 AAGCAGAGAA TATACGAAAT AAAGAGATGA AAAAATATGC TGAGGACAGG 
4 4 01 GAGCGTTTTT TTAAGCAACA GAATGAAATG GAAATACTGA CAGCCCAGCT 
4 451 GACAGAGAAA GATAGTGACC TTCAAAAGTG GCGAGAAGAA CGAGATCAAC 
4 501 TGGTTGCAGC TTTAGAAATA CAGCTAAAAG CACTGATATC CAGTAATGTA 
4 551 CAGAAAGATA ATGAAATTGA ACAACTAAAA AGGATCATAT CAGAGACTTC 
4601 TAAAATAGAA ACACAAATCA TGGATATCAA GCCCAAACGT ATTAGTTCAG 
4 651 CAGATCCTGA CAAACTTCAA ACTGAACCTC TATCGACAAG TTTTGAAATT 
4 701 TCCAGAAATA AAATAGAGGA TGGATCTGTA GTCCTTGACT CTTGTGAAGT 
4751 GTCAACAGAA AATGATCAAA GCACTCGATT TCCAAAACCT GAGTTAGAGA 
4 801 TTCAATTTAC ACCTTTACAG CCAAACAAAA TGGCAGTGAA ACACCCTGGT 
4 851 TGTACCACAC CAGTGACAGT TGAGATTCCC AAGGCTCGGA AGAGGAAGAG 
4 901 TAATGAAATG GAGGAGGACT TGGTGAAATG TGAAAATAAG AAGAATGCTA 
4951 CACCCAGAAC TAATTTGAAA TTTCCTATTT CAGATGATAG AAATTCTTCT 
5001 GTCAAAAAGG AACAAAAGGT TGCCATACGT CCATCATCTA AGAAAACATA 
5051 TTCTTTACGG AGTCAGGCAT CCATAATTGG TGTAAACCTG GCCACTAAGA 
5101 AAAAAGAAGG AACACTACAG AAATTTGGAG ACTTCTTACA ACATTCTCCC 
5151 TCAATTCTTC AATCAAAAGC AAAGAAGATA ATTGAAACAA TGAGCTCTTC 
5201 AAAGCTCTCA AATGTAGAAG CAAGTAAAGA AAATGTGTCT CAACCAAAAC 
5251 GAGCCAAACG GAAATTATAC ACAAGTGAAA TTTCATCTCC TATTGATATA 
5301 TCAGGCCAAG TGATTTTAAT GGACCAGAAA ATGAAGGAGA GTGATCACCA 
5351 GATTATCAAA CGACGACTTC GAACAAAAAC AGCCAAATAA ATCACTTATG 
5401 GAAATGTTTA ATATAAATTT TATAGTCATA GTCATTGGAA CTTGCATCCT 
54 51 GTATTGTAAA TATAAATGTA TATATTATGC ATTAAATCAC TCTGCATATA 
5501 GATTGCTGTT TTATACATAG TATAATTTTA ATTCAATAAA TGAGTCAAAA 
5551 TTTGTATATT TTTATAAGGC TTTTTTATAA TAGCTTCTTT CAAACTGTAT 
5601 TTCCCTATTA TCTCAGACAT TGGATCAGTG AAGATCCTAG GAAAGAGGCT 
5651 GTTATTCTCA TTTATTTTGC TATACAGGAT GTAATAGGTC AGGTATTTGG 
5701 TTTACTTATA TTTAACAATG TCTTATGAAT TTTTTTTACT TTATCTGTTA 
5751 TACAACTGAT TTTACATATC TGTTTGGATT ATAGCTAGGA TTTGGAGAAT 
5801 AAGTGTGTAC AGATCACAAA ACATGTATAT ACATTATTTA GAAAAGATCT 
5851 CAAGTCTTTA ATTAGAATGT CTCACTTATT TTGTAAACAT TTTGTGGGTA 
5901 CATAGTACAT GTATATATTT ACGGGGTATG TGAGATGTTT TGACACAGGC 
5951 ATGCAATGTG AAATACGTGT ATCATGGAGA ATGAGGTATC CATCCCCTCA 
6001 AGCATTTTTC CTTTGAATTA CAGATAATCC AATTACATTC TTTAGATCAT 
6051 TTAAAAATAT ACAAGTAAGT TATTATTGAT TATAGTCACT CTATTGTGCT 
6101 ATCAGATAGT AGATCATTCT TTTTATCTTA TTTGTTTTTG TACCCATTAA 
6151 CCATCCCCAC CTCCCCCTGC AACCGTCAGT ACCCTTACCA GCCACTGGTA 
6201 ACCATTCTTC TACTCTGTAT GCCCATGAGG TCAATTGATT TTATTTTTAG 
6251 ATCCCATAAA TAAATGAGAA CATGCAAAAA AAAA 



BLAST Results 



Entry 
human 



HS898149 from database EMBL: 
STS WI-9217. 
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Score - 4247, P - 1.5e-187, identities - 855/862 



' Medline entries 



94119956: 

Cloning of cDNAs for M-phase phosphoproteins recognized 
by the MPM2 monoclonal antibody and determination of the 
phosphorylated epitope. 

98101856: 

Interaction of a Golgi-associated kinesin-like protein with 
Rab6. 

95122643: 

Identification and partial characterization of mitotic 
centromere-associated kinesin, a 

kinesin-related protein that associates with centromeres during 
mitosis . 



Peptide information for frame 3 



ORF from 48 bp to 5387 bp; peptide length: 1780 
Category: known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (152-160) 



1 MESNFNQEGV PRPSYVFSAD PIARPSEINF DGIKLDLSHE FSLVAPNTEA 

51 NSFESKDYLQ VCLRIRPFTQ SEKELESEGC VHILDSQTVV LKEPQCILGR 

101 LSEKSSGQMA QKFSFSKVFG PATTQKEFFQ GCIMQPVKDL LKGQSRLI FT 

151 YGLTNSGKTY TFQGTEENIG ILPRTLNVLF DSLQERLYTK MKLKPHRSRE 

201 YLRLSSEQEK EEIASKSALL RQIKEVTVHN DSDDTLYGSL TNSLNISEFE 

251 ESIKDYEQAN LNMANSIKFS VWVSFFEIYN EYI YDLFVPV SSKFQKRKML 

301 RLSQDVKGYS FIKDLQWIQV SDSKEAYRLL KLGIKHQSVA FTKLNNASSR 

351 SHSIFTVKIL QIEDSEMSRV IRVSELSLCD LAGSERTMKT QNEGERLRET 

401 GNINTSLLTL GKCINVLKNS EKSKFQQHVP FRESKLTHYF QSFFNGKGKI 

451 CMIVNISQCY LAYDETLtJVL KFSAIAQKVC VPOTLNSSQD KLFGPVKSSQ 

501 DVSLDSNSNS KILNVKRATI SWENSLEDLM EDEDLVEELE NAEETQNVET 

551 KLLDEDLDKT LEENKAFISH EEKRKLLDLI EDLKKKLINE KKEKLTLEFK 

601 IREEVTQEFT QYWAQREADF KETLLQEREI LEENAERRLA IFKDLVGKCD 

651 TREEAAKDIC ATKVETEEAT ACLELKFNQI KAELAKTKGE LIKTKEELKK 

701 RENESDSLIQ ELETSNKKII TQNQRIKELI NIIDQKEDTI NEFQNLKSHM 

751 ENTFKCNDKA DTSSLIINNK LICNETVEVP KDSKSKICSE RKRVNENELQ 

801 QDEPPAKKGS IHVSSAITED QKKSEEVRPN IAEIEDIRVL QENNEGLRAF 

851 LLTIENELKN EKEEKAELNK QIVHFQQELS LSEKKNLTLS KEVQQIQSNY 

901 DIAIAELHVQ KSKNQEQEEK IMKLSNEIET ATRSITNNVS QIKLMHTKID 

951 ELRTLDSVSQ ISNIDLLNLR DLSNGSEuDN LPNTQLDLLG NDYLVSKQVK 

1001 EYRIQEPNRE NSFHSSIEAI WEECKEIVKA SSKKSHQIEE LEQQIEKLQA 

1051 EVKGYKDENN RLKEKEHKNQ DDLLKEKETL IQQLKEELQE KNVTLDVQIQ 

1101 HVVEGKRALS ELTQGVTCYK AKIKELETIL ETQKVERSHS AKLEQDILEK 

1151 ESIILKLERN LKEFQEHLQD SVKNTKDLNV KELKLKEEIT QLTNNLQDMK 

1201 HLLQLKEEEE ETNRQETEKL KEELSASSAR TQNLKADLQR KEEDYADLKE 

1251 KLTDAKKQIK QVQKEVSVMR DEDKLLRIKI NELEKKKNQC SQELDMKQRT 

1301 IQQLKEQLNN QKVEEAIQQY ERACKDLNVK EKIIEDMRMT LEEQEQTQVE 

1351 QDQVLEAKLE EVERLATELE KWKEKCNDLE TKNNQRSNKE HENNTDVLGK 

1401 LTNLQDELQE SEQKYNADRK KWLEEKMMLI TQAKEAENIR NKEMKKYAED 

1451 RERFFKQQNE MEILTAQLTE KDSDLQKWRE ERDQLVAALE IQLKALISSN 

1501 VQKDNEIEQL KRIISETSKI ETQIMDIKPK RISSADPDKL QTEPLSTSFE 

1551 ISRNKIEDGS VVLDSCEVST ENDQSTRFPK PELEIQFTPL QPNKMAVKHP 

1601 GCTTPVTVEI PKARKRKSNE MEEDLVKCEN KKNATPRTNL KFPISDDRNS 

1651 SVKKEQKVA1 RPSSKKTYSL RSQASIIGVN LATKKKEGTL QKFGDFLQHS 

1701 PSILQSKAKK IIETMSSSKL SNVEASKENV SQPKRAKRKL YTSEISSPID 

1751 ISGQVILMDQ KMKESDHQII KRRLRTKTAK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b4, frame 3 

TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds., N - 1, Score = 3743, P - 0 
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PIR:A36881 MPM2-reactive phosphoprotein 1 - human (fragment), N - 2, 
Score - 2808, P = 2.5e-294 

TREMBL:AF070672_1 product: M rabkinesin6" ; Homo sapiens rabkinesinS 
mRNA, complete eds., N - 2, Score = 680, P » 2.6e-99 



>TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 
Length - 753 



Score - 3743 (561.6 bits), Expect » 0.0e+00, P - 0.0e+00 
Identities - 752/753 (99%), Positives = 753/753 (100%) 

Query: 1028 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 1087 

VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 
SbjCt: 1 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 60 

Query: 1088 LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 
Sbjct: 61 LQEKNVTLDVQIQHWEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 120 

Query: 1148 LEKESI I LKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 1207 

LEKESI I LKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 
Sbjct: 121 LEKESI I LKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 180 

Query: 1208 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 1267 

EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 
Sbjct: 181 EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 240 

Query: 1268 VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 1327 

VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 
Sbjct: 241 VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 300 

Query: 1328 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 1387 

NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 
Sbjct: 301 NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 360 

Query: 1388 NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 1447 

NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 
Sbjct: 361 NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 420 

Query: 1448 AEDRERFFKQQNEMEI LTAQLTEKDSDLQKWREERDQLVAALEIQLKALI SSNVQKDNEI 1507 

AEDRERFFKQQNEMEI LTAQLTEKDSDLQKWREERDQLVAALEIQLKAL I SSNVQKDNEI 
Sbjct: 421 AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALI SSNVQKDNEI 480 

Query: 1508 EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 1567 

EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSWLDSCE 
Sbjct: 481 EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 540 

Query: 1568 VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNEMEEDLVK 1627 

VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTV+IPKARKRKSNEMEEDLVK 
Sbjct: 541 VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVK 600 

Query: 1628 CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 1687 

CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 
Sbjct: 601 CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVNLATKKKE 660 

Query: 1688 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 1747 

GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 
SbjCt: 661 GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 720 

Query: 1748 PI DI SGQVILMDQKMKESOHQI I KRRLRTKTAK 1780 

P I D I S GQV ILMDQKMKESDHQII KRRLRT KTAK 
Sbjct: 721 PI DISGQV ILMDQKMKESDHQII KRRLRTKTAK 753 

Score m 197 (29.6 bits), Expect - 2.1e-ll, P - 2.1e-ll 
Identities - 114/542 (21%), Positives » 253/542 (46%) 

Query: 692 IKTKEELKKRENESDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHM- 750 

+K + + E + I++L+ K +N R+KE + ++D + E + L + 
SbjCt: 1 VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEH--KNQDDLLKEKETLIQQLK 58 

Query: 751 ENTFKCNDKADTS-SLI INNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAK-- 807 

E + N D ++ K +E + K+KI E + + E + + AK 

SbjCt: 59 EELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKI-KELETILETQKVERSHSAKLE 117 

Query: 808 KGSIHVSSAITEDQKKSEEVRPNIAE-IEDIRVLQENNEGLRAFLLTIENELKNEK 862 
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+ + S I + ++ +E + ++ + +++ + L L+ + + N L++ K 

Sbjct: 118 QDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQ 177 

Query: 863 --EEKAELNKQIVH-FQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEE 919 

EE+ E N+Q + + ELS S + L ++Q+ + +Y A+L K K + ++ 
Sbjct: 178 LKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDY ADL KEKLTDAKK 230 

Query: 920 KIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQISNIDLLNLRDLSNGSEE 978 

+1 ++ E+ S+ + + KL+ KI+EL + + SQ +D+ R + E+ 

SbjCt: 231 QIKQVQKEV SVMRD--EDKLLRIKINELEKKKNQCSQ--ELDMKQ-RTIQQLKEQ 280 

Query: 979 DNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAIWEECKEIVKASSKKSHQI 1038 

N N +++ Y + K+ ++E E+ ++E + E + K + + 

SbjCt: 281 LN— NQKVEEAIQQY — ERACKDLNVKEKI IED-MRMTLEEQEQTQVEQDQVLEAKLEEV 335 

Query: 1039 EELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEELQEKNVT 1094 

E L ++EK + + + +NN+ KEH+N D+L + L +L+E Q+ N 
SbjCt: 336 ERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKW 395 

Query: 1095 LDVQIQHVVEGKRA LS ELTQGVTC YKAKI KELET I LETQKVERSH SAKLEQDI 1147 

L++++ + KA ++ + + + E+E IL Q E+ + ++ 

SbjCt: 396 LEEKMMLITQAKEAENIRNKEMKKYAEDRERFFKQQNEME-ILTAQLTEKDSDLQKWRE- 453 

Query: 1148 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELK-LKEEITQLTNNLQDMKHLLQLK 1206 

E++ ++ LE LK + +V+ KD +++LK + E +++ + D+K + 

Sbjct: 454 -ERDQLVAALEIQLKAL ISSNVQ— KDNEIEQLKRI ISETSKIETQIMDIK---PKR 504 

Query: 1207 EEEEETNRQETEKLKEELSASSARTQN 1233 

+ ++ +TE L S + ++ 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIED 531 

Score - 186 (27.9 bits), Expect - 3.2e-10, P - 3.2e-10 
Identities = 131/674 (19%), Positives = 294/674 (43%) 

Query: 673 LELKFNQIKAELAKTKGELIKT-KEELKKRENESDSLIQELETSNKKIITQNQRIKELIN 731 

L+ K ++ + +L K K LI+ KEEL+++ 0 IQ + + + Q + 
SbjCt: 35 LKEKEHKNQDDLLKEKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKA 94 

Query: 732 IIDQKEDTINEFQNL-KSHMENTFKCNDKADTSSLI INNKLICNETVEVPKDSKSKICSE 790 

I + E TI E Q + +SH + D + S + I+ + EE +DS 
Sbjct: 95 KIKELE-TILETQKVERSHSAKLEQ— DILEKESIILKLERNLKEFQEHLQDS-r VKN 147 

Query: 791 RKRVNENELQ-QDEPPAKKGSIHVSSAITEDQKKSEEV-RPNIAEI-EDIRVLQENNEGL 847 

K +N EL+ ++E ++ + + +++ EE R ++ E++ + L 

SbjCt: 148 TKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNL 207 

Query: 848 RAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQI QSNYDI 902 

+A L E + + KE+ + KQI Q+E+S+ ++ L ++ ++ Q + + + 

SbjCt: 208 KADLQRKEEDYADLKEKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL 267 

Query: 903 AIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQI 961 

+ + +Q+ K Q +K+ + +EA + + I+M ++E +T Q+ 

Sbjct: 268 DMKQRTIQQLKEQLNNQKVEEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQV 327 

Query: 962 SNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRI — QEPNRENSFHSSIEA 1019 

L + L+ E+ L+ N + + + N ++ S + 

Sbjct: 328 LEAKLEEVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQK 387 

Query; 1020 IWEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQ— DDLLKEK 1077 

+ K+ ++ Q +E E K E+K Y ++ R ++ + + + + L EK 

Sbjct: 388 YNADRKKWLEEKMMLITQAKEAENIRNK EMKKYAEDRERFFKQQNEMEILTAQLTEK 444 

Query: 1078 ETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVER 1137 

+ + +q + +ee + L++Q++ ++ + + ++ + + ET + K +R 

SbjCt: 445 DSDLQKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKR 504 

Query: 1138 SHSAKLEQDILEKESIILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEITQLT 1193 

SA ++ E S ++ RN E + DS +N + + +L+ + T L 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQ 564 

Query: 1194 NNLQDMKH LLQLKEEEEETNRQETEKLKEEL-SASSARTQNLKADLQRKEEDYADLK 1249 

M +RH + + + ++++ +++E+L + + + +L+ D + 

Sbjct: 565 PNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSS 624 

Query: 1250 EKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL-DMKQRTIQQLKEQL 1308 

K + K 1+ K+ +R + + I +N KKK Q+ D Q + L+ + 
SbjCt: 625 VK-KEQKVAIRPSSKKTYSLRSQASI — IGVNLATKKKEGTLQKFGDFLQHSPSILQSKA 681 

Query: 1309 NNQKVEEAIQQYERACKDLNVKEKIIEDMR 1338 

+K+ E+ ++ ++KE++ R 
SbjCt: 682 — KKIIETMSSSKLSNVEAS-KENVSQPKR 708 
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Score - 165 (24.8 bits), Expect - S.8e-08, P * 5.8e-08 
Identities - 140/626 (22%), Positives - 271/626 (43%) 

Query: 536 VEELEN AEETQNVETKLLDEDLDKTLEENKAFI SHEEKRKLLDLI EDLKKKLINEKKEK- 594 

+EELE E E K +D + L+E + H+ + LL EL ++L E +EK 
Sbjct: 11 IEELEQQIEKLQAEVKGY-KDENNRLKEKE HKNQDDLLKEKETLIQQLKEELQEKN 65 

Query: 595 LTLEFKIREEVT QEFTQYWAQREADFKE--TLLQEREILEENAERRLAIFKDLVG 647 

+TL+ +1+ V E TQ +A KE T+L+ +++ E + +L +D++ 

Sbjct: 66 VTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV-ERSHSAKLE — QDILE 122 

Query: 64 8 KCDT REEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENE 704 

K E K+ ++ ■+ T L +K ++K E+ + L K L+ +E E 

Sbjct: 123 KESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEE 182 

Query: 705 SDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSS 764 

++ QE E +++ + R + L + +KE+ ++ ++ K K+S 
SbjCt: 183 EETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQK-EVSV 241 

Query: 765 LI INNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKS 824 

+ +KL+ + E+ K K CS+ + + +QQ + V AI + ++ 

Sbjct: 242 MRDEDKLLRIKINELEK — KKNQCSQELDMKQRTIQQLKEQLNNQK — VEEAIQQYERAC 297 

Query: 825 EEVRPNIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEK 884 

+++ IED+R+ E E + + + L+ + EE L ++ ++++ + E 
Sbjct: 298 KDLNVKEKI I EDMRMTLEEQEQTQ VEQDQVLEAKLEEVERLATELEKWKEKCNDLET 354 

Query: 885 KNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNE-IETATRSITN N 938 

KN S + + ++N D+ + +L + + QE E+K + +E IT N 

Sbjct: 355 KNNQRSNK — EHENNTDV-LGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAEN 411 

Query: 939 VSQIKLMHTKIDELRTLDSVSQISNIDL-LNLRD--LSNGSEEDNLPNTQLDLLGNDYLV 995 

+ ++ D R +++ + L +D L EE + L++ + 

Sbjct: 412 IRNKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALIS 471 

Query: 996 SKQVKEYRIQEPNRENSFHSSIEA-IWE-ECKEIVKASSKKSHQIEELEQQIEKLQAEVK 1053 

S K+ I++ RSSIEI++KIA KQEL E+ +++ 
Sbjct: 472 SNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKL-QTEPLSTSFEISRNKIE 530 

Query: 1054 GYKDENNRLKEKEHKNQDDLLKEKE TLIQQLKEELQEKNVTLDVQIQHVVEGKRA 1108 

+ + +Q + E T +Q K ++ T V ++ KR 

Sbjct: 531 DGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRK 590 

Query: 1109 LSELTQC-VTCYKAKIKELETILETQ-KVERSHSAKLEQDILEKES 1152 

+E+ + V C K T L+ +R+ S K EQ + + S 

Sbjct: 591 SNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPS 636 

Score = 143 {21.5 bits). Expect - 1.3e-05, P = 1.3e-05 
Identities - 164/684 (23%), Positives = 304/684 (44%) 

Query: 295 QKRKMLR-LSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASS 349 

+K +++ L ++++ + D+Q V + K A L G+ +L 
Sbjct: 49 EKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV 108 

Query: 350 -RSHSI-FTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGE-RLRETGNINTS 406 

RSHS IL+ E + +E LS+KNE +L+E T+ 

Sbjct: 109 ERSHSAKLEQDILEKESIILKLERNLKEFQE-HLQDSVKNTKDLNVKELKLKEEITQLTN 167 

Query: 4 07 LLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDET 4 66 

L K + LK E+ +Q + +L+ N K + + Y E 

Sbjct: 168 NLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADL— -QRKEEDYADLKEK 224 

Query: 4 67 LNVLKFSAIAQKVCVPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSL 526 

L K I Q V ++ +DKL +K ++ + N S+ L++K+ TI 
Sbjct: 225 LTDAK-KQIKQ-VQKEVSVMRDEDKLLR-IKINE-LEKKKNQCSQELDMKQRTIQQLKEQ 280 

Query: 527 EDLMEDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLKK 585 

+ + E+ +++ E A + NV+ K++ ED+ TLEE + + E+ ++L+ +E++++ 
Sbjct: 281 LNNQKVEEAIQQYERACKDLNVKEKII-EDMRMTLEEQEQ— TQVEQDQVLEAKLEEVER 337 

Query: 586 KLIN-EK-KEKLT-LEFKIREEVTQEFTQYWAQREADFKETLLQEREILEE NAERR 638 

EK KEK LE K + +E + K T LQ+ E+ E NA+R+ 

Sbjct: 338 LATELEKWKEKCNDLETKNNQRSNKEHEN NTDVLGKLTNLQD-ELQESEQKYNADRK 393 

Query: 639 LAIFKDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEEL 698 

+ + ++ T+ + A++I K E + + E F Q + E+ +L + +L 

Sbjct: 394 KWLEEKMM — LITQAKEAENI-RNK-EMKKYAEDRERFFKQ-QNEMEILTAQLTEKDSDL 4 48 

Query: 699 KKRENESDSLIQELETSNKKI ITQN-QR I KELINI I DQKEDT I NEFQNLKSHMENTF 754 

+K E D L+ LE K +1+ N Q+ I++L II + + ++K ++ 
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Sbjct: 44 9 QKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSA 508 

Query: 755 KCNDKADTSSLIINNKLICN — ETVEVPKDSKSKICSERK RVNENELQ-QDEP--PA 806 

DK T L +• ++ N E V DS ++ +E R + EL+ Q P P 

Sbjct: 509 D-PDKLQTEPLSTSFEISRNKIEDGSWLDS-CEVSTENDQSTRFPKPELEIQFTPLQPN 566 

Query: 807 KKGSIH — VSSAITEDQKKSEEVRPNIAEIEDI RVLQENNEGLRA FLLTIENELKNE 861 

K H +++T K+ + +NE + ++ +N R F+++ + 
SbjCt: 567 KMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVK 62 6 

Query: 862 KEEKAEL NKQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQE 918 

KE+K + +K+ + + S+ NL K+ +Q D + +SK ++ 

Sbjct: 627 KEQKVAIRPSSKKTYSLRSQASIIGV-NLATKKKEGTLQKFGDFLQHSPSILQSKAKKII 685 

Query: 919 EKIM — KLSNEIETATRSITNNVSQIKLMHTKI — DELRT-LDSVSQISNID 965 

E + KLSN +E + NVSQ K K+ £+ + +D Q+ +D 

SbjCt: 68 6 ETMSSSKLSN-VEASKE NVSQPKRAKRKLYTSEISSPI DISGQVILMD 732 

Score - 133 {20.0 bits), Expect - 1.6e-04, P - 1.6e-04 
Identities = 94/426 {22%), Positives - 188/426 (44%) 

Query: 527 EDLM-EDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLK 584 

+DL+ E E L+++L+ + +NV LD + +E +A + I++L+ 

Sbjct: 4 4 DDLLKEKETLIQQLKEELQEKNVT— LDVQIQHVVEGKRALSELTQGVTCYKAKIKELE 100 

Query: 585 KKLINEKKEKLTLEFKIREEVTQ-EFTQYWAQREA-DFKETLLQEREILEENAERRLAIF 642 

L +K E+ + K+ +++ + E +R +F+E L + ++ + L + 

Sbjct: 101 TILETQKVER-SHSAKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKL- 158 

Query: 643 KDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRE 702 

K+++ + K + K E EE + ++K EL+ + K +L+++E 

Sbjct: 159 KEEITQLTNNLQDMKHLLQLKEEEEETN RQETEKLKEELSASSARTQNLKADLQRKE 215 

Query: 703 NESDSLIQELETSKKKIITQNQRIKELINIIDQK-EDTINEFQNLKSHMENTFKCNDKA- 760 

+ L ++L T KK I Q Q+ ++ D+ INE + K+ + 

Sbjct: 216 EDYADLKEKL-TDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTI 274 

Query: 7 61 DTSSLIINNKLICNETVE VPKDS--KSKICSE-RKRVNENE— LQQDEPPAKKGS 810 

+NN+ + E ++ KD K KI + R + E E ++QD+ K 

Sbjct: 275 QQLKEQLNNQKV-EEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLE 333 

Query: 811 IHVSSAITEDQKKSEEVRP-NIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELN 869 

V TE +K E+ + ENN + L +++EL+ E E+K + 

Sbjct: 334 -EVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQ-ESEQKYNAD 391 

Query: 870 KQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNEIE 929 

++ ++++ L +T +KE + I++ + K E E+ K NE+E 

Sbjct: 392 RK-KWLEEKMML ITQAKEAENIRNK EMKKY AE DRE RF FKQQN EME 435 

Query: 930 TATRSITNNVSQIKLMHTKIDEL 952 

T +T S ++ + D+L 
Sbjct: 436 ILTAQLTEKDSDLQKWREERDQL 458 

Pedant information for DKFZphtes3_35b4, frame 3 



Report for DKFZphtes3_35b4 . 3 

(LENGTH) 1780 

[MW) 206176.77 

(plj 5.60 

( HOMOL ] TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 0.0 

(FUNCATJ 30.10 nuclear organization (S. cerevisiae, YEL061c] 2e-37 

tFUNCAT] 30.04 organization of cytoskeleton (S. cerevisiae, YEL061cJ 2e-37 

(FUNCATJ 08.22 cytoskeleton-dependent transport (S. cerevisiae, YEL061c) 2e-37 

(FUNCAT) 03.22 cell cycle control and mitosis [S. cerevisiae, YEL061c} 2e-37 

(FUNCATJ 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

7e-30 

( FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-30 

( FUNCATJ 30.05 organization of centrosome [S. cerevisiae, YPRl41c] 3e-23 

[FUNCATJ 11.01 stress response [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

IS. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YPR141C] 3e-23 

[ FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPR141C] 3e-23 

[ FUNCAT ] 09.10 nuclear biogenesis [S. cerevisiae, YPRl41c] 3e-23 

[ FUNCAT) 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) (S. cerevisiae, YKR095wJ le-21 
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[FUNCAT] 99 unclassified proteins (S. cerevisiae, YLR309c) 6e-20 

[FUNCAT] 03.04 budding, cell polarity and filament formation (S. cerevisiae, YHR023w 

MYOl - myosin-1 isoform] 4e-19 

( FUNCAT] 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform} 4e-19 

(FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YNL250w) le-15 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 
jannaschii, MJ1322] 2e-14 

( FUNCAT] 30.13 organization of chromosome structure {S. cerevisiae, YDR285w] 2e-09 

[FUNCAT] 09.04 biogenesis of cytoskeleton (S. cerevisiae, YKL179c) 3e-09 

( FUNCAT] 09.13 biogenesis of chromosome structure (S. cerevisiae, YLR086w] 2e-07 

( FUNCAT ] 03.01 cell growth [S. cerevisiae, YNL079c] 2e-07 

[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 

2e-07 

[FUNCAT] 03.22.01 cell cycle check point proteins [S. cerevisiae, YGLQ86w) le-06 

[FUNCAT] 10.05.99 other pheromone response activities [S. cerevisiae, YHRlSSc) 

3e-06 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YDR217c] 4e-06 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJR134c] 2e-05 

[FUNCAT] 05.04 translation (initiation, elongation and termination) (S. cerevisiae, 
YAL035W] 2e-04 

[FUNCAT] r general function prediction [M. jannaschii, MJ1254) 0.001 

[ BLOCKS ) BL00387A 

[BLOCKS] BL00411H 

(BLOCKS) BL00411G 

(BLOCKS) BL00411F 

[BLOCKS] BL00411E Kinesin motor domain proteins 

[BLOCKS] BL00411D Kinesin motor domain proteins 

[BLOCKS] BL00411C Kinesin motor domain proteins 

[BLOCKS) BL00411B Kinesin motor domain proteins 

[BLOCKS] BL00411A Kinesin motor domain proteins 

[SCOP] d2kin.l 3.29.1.5.3 Kinesin [Rat (Rattus norvegicus) 2e-68 

[SCOP] d2tmab_ 1.105.4.1.1 Tropomyosin [rabbit (Oryctolagus cuniculus) 4e-05 

[SCOP] d3kar 3.29.1.5.4 Kinesin [Baker's yeast (Saccharomyce 2e-09 

(EC] 3.6.1.32 Myosin ATPase 5e-25 

(PIRKW] nucleus 4e-27 

[PIRKW] phosphotransferase 3e-16 

[ PIRKW) duplication' 6e-20 

[PIRKW] citrulline 6e-18 

[PIRKW] tandem repeat 4e-24 

[PIRKW] heterodimer 3e-28 

IPIRKW] endocytosis le-23 

IPIRKW) heart le-17 • 

(PIRKW) transmembrane protein 2e-28 

[PIRKW] serine/threonine-specific protein kinase 3e-16 

[PIRKW] 2inc finger le-23 

[ PIRKW] surface antigen 2e-16 

[PIRKW] DNA binding le-25 

[PIRKW] metal binding le-23 

(PIRKW] muscle contraction 4e-24 

(PIRKWJ heterotetramer 4e-24 

(PIRKW) acetylated amino end 2e-19 

(PIRKW) actin binding 5e-25 

[PIRKW] mitosis 3e-58 

[PIRKW] microtubule binding 3e-58 

[PIRKW] ATP 3e-58 

(PIRKW] thick filament 4e-24 

(PIRKW] phosphoprotein 9e-29 

( PIRKW] leucine zipper le-12 

(PIRKWJ skeletal muscle 8e-24 

(PIRKW] disulfide bond le-12 

(PIRKW) heterotrimer le-29 

(PIRKW] calcium binding 6e-18 

[PIRKWJ alternative splicing 4e-21 

[PIRKWJ P-loop 2e-63 

[PIRKWJ coiled coil 3e-58 

[PIRKWJ heptad repeat le-25 

[PIRKWl methylated amino acid 4e-24 

[PIRKW] peripheral membrane protein le-23 

[PIRKW] dimer le-12 

[PIRKW] cardiac muscle le-17 

[PIRKW] hydrolase 5e-25 

( PIRKW] microtubule 6e-15 

[PIRKW] muscle 7e-23 

( PIRKW] membrane protein 6e-20 

[PIRKW] GTP binding 8e-22 

(PIRKW) EF hand 6e-18 

[PIRKW) cell division le-25 

[PIRKW] cytoskeleton 4e-24 

[PIRKW] hair 6e-18 

[PIRKW] Golgi apparatus 8e-24 

[PIRKW] calmodulin binding le-23 
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[KWJ 
tKW] 
IKW] 
[KW] 



(SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM) 
[SUPFAM] 
[SUPFAM] 
[ SUPFAM) 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
I SUPFAM] 
(SUPFAM) 
[SUPFAM] 
(SUPFAM) 
(SUPFAM) 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PFAM] 



unassigned Ser/Thr or Tyr-specific protein kinases 3e-16 

myosin motor domain homology Se-25 

alpha-actinin actin-binding domain homology le-13 

kinesin-related protein KIP1 9e-27 

kinesin-related protein CIN8 4e-36 

kinesin heavy chain 4e-24 

plectin le-13 

trichohyalin 6e-18 

kinesin-related protein KIF3 le-29 

kinesin-related protein KIF2 3e-20 

ribosomal protein S10 homology le-13 

giantin 8e-24 

protein kinase homology 3e-16 

protein kinase C zinc-binding repeat homology 2e-13 

kinesin-related protein unc-104 8e-26 

human early endosome antigen 1 le-23 

unassigned kinesin-related proteins le-28 

Mycoplasma genitalium hypothetical protein MG218 4e-17 

myosin heavy chain 5e-25 

conserved hypothetical P115 protein 4e-20 
centromere protein E 5e-24 
calmodulin repeat homology 6e-18 
kinesin-related protein KLP61F le-25 
hypothetical protein MJ0914 3e-12 
kinesin-related protein MKLP-1 2e-63 
pleckstrin repeat homology 8e-26 
hypothetical protein MJ1322 4e-13 
kinesin-related protein KIF1B 3e-28 
kinesin motor domain homology 2e-63 
kinesin-related protein KLPA 7e-25 
kinesin-related protein nodA le-12 
kinesin-related protein Eg5 5e-30 
ATP_GTP_A 1 
Kinesin motor domain 
Irregular 
3D 

LOW_COMPLEXITY 7.53 % 

COILED_COIL 19.78 % 



SEQ MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 

SEG 

COILS 

3kar- 

SEQ VCLRIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 

SEG 

COILS 

3kar- 

SEQ PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 

SEG 

COILS 

3kar- 

SEQ DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQIKEVTVHNDSDDTLYGSL 

SEG 

COILS 

3kar- 

SEQ TNSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEI YNEYI YDLFVPVSSKFQKRKML 

SEG 

COILS 

3kar- EEEEEEEEEEETTEEEETTTCC CCEE 

SEQ RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 

SEG 

COILS 

3kar- EEETTTTE-EEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEE 

SEQ QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

SEG 

COILS 

3kar- E--EETTTTCEEEEEEEEEECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

SEQ EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVMISQCYLAYDETLNVLKFSAIAQKVC 

SEG 

COILS 

3kar- TTTT— TCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHH 

SEQ VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IREEVTQEFTQYWAQREADFKETLLQEREILEENAERRIJUFKDLVGKCDTREEAAKDIC 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKII 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ TQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS CCCC 

3kar- 

SEQ QENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQIQSNY 

SEG xxxxxxxxxxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- 

SEQ WEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ AKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMK 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG .xxxxxxxxxxxxxxxxxxx 

COILS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG 

COI LS CCCCCCCCCCCC 

3kar- 

SEQ ERACKDLNVKEKI IEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG : xxxxxxxxxxxxxxxxx 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIR 

SEG 

COILS CC 

3kar- 

SEQ NKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSN 

SEG 
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COILS 

3kar- 

SEQ VQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGS 

SEG 

COILS 

3kar- 

SEQ VVLDSCEVSTENDQSTRFPKPELEIQFTPLQPHKMAVKHPGCTTPVTVEIPKARKRKSNE 

SEG 

COILS 

3kar- 

SEQ MEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKI IETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- 

SEQ YTSEISSPIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 

SEG 

COILS 

3kar- 



Prosite for DKFZphtes3_35b4 . 3 



PS00017 152->160 ATP GTP A PDOC00017 



Pram for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscWQWPpWtGyktvhnghegds phks . 

R+RP+ + E++ + +V + ++++ ++ + ++ 
Query 64' RIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQK 112 

HMM FtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTIFAYGQTGSGKTYTM 

F+F +VF++++TQ++ +++ + V+D+++G IF+YG T SGKTYT 
Query 113 FS FS KV FG PATTQKE F FQGC I MQP VKDLL KGQS RLI FT YGLTN SGKT YTF 162 

HMM MGpggehPDHmGIIPRcCHDIFdrldkfqekDhdFW 

G +++GI+PR+++ +FD++ + +++ 

Query 163 QG TEENIGILPRTLNVLFDSLQERL- YTKMNLKPHRSREYLRLSSE 207 

HMM 

Query 208 QEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESIKDYE 257 

HMM hVkCSYMEI YNEelYDLLCPnP. . . qhMkpLnlHEHPN 

+V +S++EIYNE+IYDL +P++ Q++K L++ + + 
Query 258 QANLNMANSIKFSVWVSFFEIYNEYI YDLFVPVSSKFQKRKMLRLSQDVK 307 

HMM MGpYVqGCTEf HVcSYeDachWIWqGnknRHVAaTnMNdhSSRSHtlFTI 

++++++ V +A +++ +G K+ VA T++N SSRSH+IFT+ 
Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTV 357 

HMM HVeQrHk. qcdehvcHSKMNLVDLAGSERvnrTGAEGQRIKEGcNINqSL 

++ Q + + +++S ++L DLAGSER+ +T+ EG RL+E +NIN SL 
Query 358 KILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSL 407 

HMM t tLGnVInaLaDgqTKYmYgghgHI PYRDSKLTWlLQDSLGGNcKTcMIA 
+TLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI+ 
Query 408 LTLGKCINVLKNSE KSKFQQHVPFRESKLTHYFQSFFNGKGKICMIV 4 54 

HMM CIWPadWNYEETLSTLRYAdRAKnIJcNkPQINEDPca* 

+1+ + Y+ETL++L++ + A+++ + ++N+++++ 
Query 455 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 491 



829 



WO 01/12659 



PCT/1B00/01496 



DKFZphtes3_35b5 



group : metabolism 

DKFZphtes3 35b5 encodes a novel 466 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase <v-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of specialized cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E . The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 

strong similarity to bovine vacuolar ATPase (EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-terminus 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2043 bp 

Poly A stretch at pos. 2033, polyadenylation signal at pos. 2012 

1 GGCGGCCATG GCGACGGCTC GAGTGCGGAT GGGGCCGCGG TGCGCCCAGG 
51 CGCTCTGGCG CATGCCGTGG CTGCCGGTGT TTTTGTCGTT GGCGGCGGCG 

101 GCGGCGGCGG CAGCGGCGGA GCAGCAGGTC CCGCTGGTGC TGTGGTCGAG 

151 TGACCGGGAC TTGTGGGCTC CTGCGGCCGA CACTCATGAA GGCCACATCA 

201 CCAGCGACTT GCAGCTCTCT ACCTACTTAG ATCCCGCCCT GGAGCTGGGT 

251 CCCAGGAATG TGCTGCTGTT CCTGCAGGAC AAGCTGAGCA TTGAGGATTT 

301 CACAGCATAT GGCGGTGTGT TTGGAAACAA GCAGGACAGC GCCTTTTCTA 

351 ACCTAGAGAA TGCCCTGGAC CTGGCCCCCT CCTCACTGGT GCTTCCTGCC 

401 GTCGACTGGT ATGCAGTCAG CACTCTGACC ACTTACCTGC AGGAGAAGCT 

451 CGGGGCCAGC CCCTTGCATG TGGACCTGGC CACCCTGCGG GAGCTGAAGC 

501 TCAATGCCAG CCTCCCTGCT CTGCTGCTCA TTCGCCTGCC CTACACAGCC 

551 AGCTCTGGTC TGATGGCACC CAGGGAAGTC CTCACAGGCA ACGATGAGGT 

601 CATCGGGCAG GTCCTGAGCA CACTCAAGTC CGAAGATGTC CC AT AC AC AG 

651 CGGCCCTCAC AGCGGTCCGC CCTTCCAGGG TGGCCCGTGA TGTAGCCGTG 

701 GTGGCCGGAG GGCTAGGTCG CCAGCTGCTA CAAAAACAGC CAGTATCACC 

751 TGTGATCCAT CCTCCTGTGA GTTACAATGA CACCGCTCCC CGGATCCTGT 

801 TCTGGGCCCA AAACTTCTCT GTGGCGTACA AGGACCAGTG GGAGGACCTG 

851 ACTCCCCTCA CCTTTGGGGT GCAGGAACTC AACCTGACTG GCTCCTTCTG 

901 GAATGACTCC TTTGCCAGGC TCTCACTGAC CTATGAACGA CTCTTTGGTA 

951 CCACAGTGAC ATTCAAGTTC ATTCTGGCCA ACCGCCTCTA CCCAGTGTCT 
1001 GCCCGGCACT GGTTTACCAT GGAGCGCCTC GAAGTCCACA GCAATGGCTC 
1051 CGTCGCCTAC TTCAATGCTT CCCAGGTCAC AGGGCCCAGC ATCTACTCCT 
1101 TCCACTGCGA GTATGTCAGC AGCCTGAGCA AGAAGGGTAG TCTCCTCGTG 
1151 GCCCGCACGC AGCCCTCTCC CTGGCAGATG ATGCTTCAGG ACTTCCAGAT 
1201 CCAGGCTTTC AACGTAATGG GGGAGCAGTT CTCCTACGCC AGCGACTGTG 
1251 CCAGCTTCTT CTCCCCCGGC ATCTGGATGG GGCTGCTCAC CTCCCTGTTC 
1301 ATGCTCTTCA TCTTCACCTA TGGCCTGCAC ATGATCCTCA GCCTCAAGAC 
1351 CATGGATCGC TTTGATGACC ACAAGGGCCC CACTATTTCT TTGACCCAGA 
1401 TTGTGTGACC CTGTGCCAGT GGGGGGGTTG AGGGTGGGAC GGTGTCCGTG 
14 51 TTGTTGCTTT CCCACCCTGC AGCGCACTGG ACTGAAGAGC TTCCCTCTTC 
1501 CTACTGCAGC ATGAACTGCA AGCTCCCCTC AGCCCATCTT GCTCCCTCTT 
1551 CAGCCCGCTG AGGAGCTTTC TTGGGCTGCC CCCATCTCTC CCAACAAGGT 
1601 GTACATATTC TGCGTAGATG CTAGACCAAC CAGCTTCCCA GGGTTCGTCG 
1651 CTGTGAGGCG TAAGGGACAT GAATTCTAGG GTCTCCTTTC TCCTTATTTA 
1701 TTCTTGTGGC TACATCATCC CTGGCTGTGG ATAGTGCTTT TGTGTAGCAA 
1751 ATGCTCCCTC CTTAAGGTTA TAGGGCTCCC TGAGTTTGGG AGTGTGGAAG 
1801 TACTACTTAA CTGTCTGTCC TGCTTGGCTG CCGTTATCGT TTTCTGGTGA 
1851 TGTTGTGCTA ACAATAAGAA GTACACGGGT TTATTTCTGT GGCCTGAGAA 
1901 GGAAGGGACC TCCACGACAG GTGGGCTGGG TGCGATCGCC GGCTGTTTGG 
1951 CATGTTCCCA CCGGGAGTGC CGGGCAGGAG CATGGGGTGC TTGGTTGTTT 
2001 CCTTCCTAAT AAAATAAACG CGGGTCGCCA TGCAAAAAAA AAA 



BLAST Results 



No BLAST result 
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Medline entries 



95014142: 

A novel accessory subunit for vacuolar H( + )-ATPase from chromaffin 
granules . 

97215246: 

Identification of a rat brain gene associated with aging by 
PCR differential display method. 



Peptide information for frame 2 



ORF from 6 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 



1 MATARVRMGP RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VpLVLWSSDR 
51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLQEKLGA 
151 SPLHVDLATL RELKLNASLP ALLLIRLPYT ASSGLMAPRE VLTGNDEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA VVAGGLGRQL LQKQPVSPVI 
251 HPPVSYNDTA PRILFWAQNF SVAYKDQWED LTPLTFGVQE LNLTGSFWND 
301 SFARLSLTYE RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
351 YFNASQVTGP SIYSFKCEYV SSLSKKGSLL 'VARTQPSPWQ MMLQDFQIQA 
401 FNVHGEQFSY ASDCASFFSP GIWMGLLTSL FMLFIFTYGL KMILSLKTMD 
451 RFDDHKGPTI SLTQ1V 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35bS, frame 2 

TREHBL : AF035387 1 gene: "C7-1"; product: "C7-1 protein"; Rattus 
norvegicus C7-l~protein (C7-1) mRNA, complete cds., N - 1, Score - 
2088, P = 3.8e-216 

PIR:A55116 vacuolar ATPase (EC 3.6.1.-) chain Ac45 - bovine, N ■ 1 , 
Score - 2011, P - 5.5e-208 

PIR: 154197 hypothetical protein - human, N - 1, Score » 1464, P » 
5.1e-150 

>TREMBL: AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus 
C7-1 protein (C7-1) mRNA, complete cds. 
Length -463 



Score » 2088 (313.3 bits), Expect = 3.8e-216, P - 3.8e-216 
Identities - 408/463 (884), Positives - 426/463 (92%) 



+R+R G R A LW + LSL A AAA AAEQQV PLVLWS S DRDLWA P ADTHEGH 



Query: 


4 


Sbjct: 


8 


Query: 


64 


Sbjct: 


62 


Query: 


124 


Sbjct: 


122 


Query: 


184 


Sbjct: 


182 


Query: 


244 


Sbjct: 


242 


Query: 


304 



ITSD+QLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 



PSSLVLPAVDWYA+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLIRLPYTASS 



GLMAPREVLTGNDEVIGQVLSTL+SEDVPYTAALTAVRPSRVARDVA+VAGGLGRQLLQ 



Q SP I H PPV S YNDTAPRI L FWAQN FS VA Y KD+W+ DLT LTFGV+ LNLTGSFWNDSFA 



LSLTYE LFG TVTFKFILA+R YPVSAR+WFTMERLE+HSNGSVA+ FN SQVTGPSIY 
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Sbjct: 302 MLSLTYEPLFGATVTFKFILASRFYPVSARYWFTMERLEIHSNGSVAHFNVSQVTGPSIY 361 

Query: 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDC AS FFSPGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 
Sbjct: 362 S FHC E YVSS LS KKG S LL VTN V - PS LWQMTLHNFQIQAFNVTGEQFSYAS DC AG FFSPGIW 420 

Query: 424 MGLLTSLFMLFI FTYGLHMILSLKTMDRFDDHKGPTISLTQIV 466 

MGLLT+LFMLFI FTYGLHMILSLKTMDRFDD KGPTI+LTQIV 
Sbjct: 421 MGLLTTLFM1.FI FTYGLHMI LSLKTMDRFDDRKGPTITLTQI V 463 



Pedant information for DKFZphtes3_35b5, frame 2 



Report for DKFZphtes3_35b5 . 2 



[LENGTH] 


466 




[MW1 


51621.44 




[pU 


5.73 


gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 


[HOMOLJ 


TREMBL:AF03S387_1 


protein (C7-1) 


mRNA, complete cds 


. 0.0 


[PIRKW] 


hydrolase 0.0 




(PROSITE) 


MYRISTYL 7 




(PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


7 


[PROSITE] 


TYR~ PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


8 


[PROSITE] 


ASN GLYCOSYLATION 


7 


[KW] 


SIGNAL PEPTIDE 38 




[KW] 


TRANSMEMBRANE 1 




[KW] 


LOW COMPLEXITY 


11.59 * 



SEQ MATARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTH 

SEG xxxxxxxxx 

PRD ccceeeecccchhhhhhhcccchhhhhhhhhhhhhhhhhccceeeecccccccccccccc 

MEM 

SEQ EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 

SEG 

PRD ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 

MEM 

SEQ DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

SEG xxxxxxxxxxxxxxx. . . 

PRD ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 

MEM 

SEQ ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQL 

SEG xxxxxxxxxxxxxxxxxxxx. . 

PRD cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 

MEM 

SEQ LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLT FGVQELNLTGSFWND 

SEG 

PRD hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

MEM . 

SEQ SFARLSLTYERLFGTTVT FKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 

SEG 

PRD hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

MEM 

SEQ SIYSFHCEYVS S LS KKGS LLVARTQPS PWQMMLQD FQ IQA FNVMGEQ FS Y AS DC AS F FS P 

SEG xxxxxxxxxx 

PRD ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 

MEM MMMMMM 

SEQ GIWMGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhecccccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_35b5 . 2 

PS00001 166->170 ASN_GLYCOSYLATION PDOC00001 
PS00001 257->261 ASN GLYCOSYLATION PDOC00001 
PS00001 269->273 ASN~GLYCOSYLATION PDOC00001 
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PS00001 


292- 


>296 


ASN GLYCOSYLATION 


PDOC00001 


PS000O1 


299- 


>303 


ASN_GLYCOSYLATION 


PDOC00001 


PS00001 


346- 


>350 


ASN^GLYCOSYLATION 


PDOCUUUU1 


PS00001 


353- 


>357 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


375- 


■>379 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




3->6 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


48 


l->51 


PKC~PHOSPHO~SITE 


PDOC00005 


PS00005 


159- 


>162 


PKC~PHOSPHO~SITE 


PDOCO0005 


PS00005 


205- 


>208 


PKC~PHOSPHO~SITE 


PDOCO0005 


PS00005 


318- 


>321 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


331- 


>334 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


374- 


>377 


PKC PHOSPHO~SITE 


POOC00005 


PS00005 


445- 


>446 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


48 


->52 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


72 


->76 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


94 


->98 


CK2~PHOSPH0 SITE 


PDOC00006 


PS00006 


114- 


>118 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


159- 


>163 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


255- 


>259 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


207- 


>214 


TYR PHOSPHORS ITE 


PDOC00007 


PS00008 


102- 


MOB 


MYRISTYL 


PDOC00008 


PS00008 


103- 


>109 


MYRISTYL 


POOC00008 


PS00008 


200- 


>206 


MYRISTYL 


PDOC00003 


PS00008 


295- 


>301 


MYRISTYL 


PDOC00008 


PS00008 


314- 


>320 


MYRISTYL 


PDOC00008 


PS00008 


421- 


>427 


MYRISTYL 


PDOC00008 


PS00008 


425- 


>431 


MYRISTYL 


PDOC00008 



{No Pfam data available for DKFZphtes3_35b5.2) 
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group: differentiation/development 

DKFZphtes3_35e21 . 2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7 . 

Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, and 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. 



similarity to interleukin-7 precursor 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2095 bp 

Poly A stretch at pos, 2085, polyadenylation signal at pos. 2067 



1 GGATGAAAGT GATTTAATTC ATTTTTAGAA jtttTTTTTT GTTTTGTTTT 
51 AGCAACATGC TGAACAACTA ATTTACTTTA AAAATAAGCC AGTTAAAACA 
101 AAGGACGCTA AGCCCAAGTG GGGGGCAATA TTAGTCAGGA TCTTTGGGGT 
151 CTAATTCCAG ACCAACTTTC AGAAGCACTT CTTTGTCTCT GTTCTCACCT 
201 CTGCTGTCCC TCTCTTCCCT CATCCCCTAA GAGAGACAAA GATAAAAGCC 
251 CACCTGCATC CCTAAGTCTT ACTGAGATCA GCCACCCCAG GGGAGAGAAA 
301 CTGGATCTAC TTACAGCCAC CCCCTGTTTC CATCCATATA CTTACTTCCC 
351 CCAATTTGCA TGTGATTATG GAAACAAGTC ATGCTCATGA AAGCAACTGT 
401 AAAATAAAAG GTTATGGAGT AGTTCAGCAA CTTCTTCACA GCCAGCTTTG 
451 TGGAGCTGGG GAGGACTTAG GGCCCATTGG AGTCTCTTAT GTGTACAGCT 
501 TCAGGGCTGT CCCTTTCAGT TTGATTTTAA GCAATGCCTC ACTTCATAGC 
551 TTAGGGGGTA AGGATTCCAT TCAGGTAGGT TGTCTAAAGG AACTAATGGG 
601 ACCTCTCAGT GAATTAGCTG ACCAGATTTT AGGAAATCTT TTTAATTTCT 
651 ATGATTTTCC TTCTCACATT TTGAAATGGT AAAATTGACT GGAAATAATT 
701 TTTCTTGGTG CCTTATTGGT TTTCCTTGCA AACCTTTCTC ATATTTTCTC 
751 ATGACCATTG CCAGTGACCA AGGCCCATGT GTGTGTTGTG TGTAATTGTG 
801 GGCATGTACA AGCTTAAATA ACGTGCCGAC AGCACTGTTT CAAAGTTGGT 
851 ATTCATTAGG CTGTTGCCTC CTGGGCTGGA GCTGCGCTAA TCCTGACACC 
901 GGCTGCCAGG AGAAAACCTC ATGGATCACA CACCAAACCT TAATAACAGC 
951 ATCCGTGACC TGCACTCTCC AGTACAGAAT GGGAACCCCA GAGCTAGGAA 
1001 ATGTAGTTGT ATATTTTAAT GAACTGCTAC CCCAGCCAAA GAAGCTTCTT 
1051 TCACTTTTGT GCTCTACAGA AAGCCCAAGG GGGGTAGGAG GGACAGAGCT 
1101 TTGAATAACT GCTTTCTAAC ACTAAATGTG GCCAACAGGA CAGAGCACAT 
1151 CACACGTATA GGCAGGTGTG AGGGACAGTG GCTAAGAATT GCCTGCTCCC 
1201 TCTGCATGCT CTTTCTTGTT TCCAAAGTCC AATCAAGTGA TCCTGGGAAA 
1251 CAAATCTGTC TGGATTGCGG AGGGTGGTTC TGAAAGAACT GCCAAGACGT 
1301 TAAAGAAGGG TGAAGAGTAG GCAGAATATA AGTAGCTAAC CTGAGTCAAG 
1351 ACTCTCAAAA GCTAGCAGCC TGATGACAAT AGGATTTATT TCAGCCAGGA 
1401 TAGTGTCTGT CTGTGAGTGC ATCATTTTAA GACAGTATGA CTTCATGTTG 
14 51 TTACAAACTA TGTATAGTAT GTATGTTTTG TGGGTTGTAT ATATACATAA 
1501 TATATATTAT ATATATATAT GAGAGATTTG GTGACTTTTG ATACGGGTTT 
1551 GGTGCAGGTG AATTTATTAC TGAGCCAAAT GAGGCACATA CCGAGTCAGT 
1601 AGTTGAAGTC CAGGGCATTC GATACTGTTT ATGATTTCCA TATATGTATA 
1651 GTGCCTATCC CATGCTGTAG TCACTGTTAT GTTAAATCCA GAAGTTACAC 
1701 TAGAGCCAGC GATACTTTAT TTGTAGACAA TCAATTTGAA TCCATATGTT 
17 51 ATTACTGGCA GATGATACAT GATTACAGTT CTGAATCTGT AACACTTACA 
1801 AAAGGAAACC CAGAGCAGCT TGATGAGTTT TTGTTTCTGC TTCGTTCCTG 
1851 GGAGTCAGTA GAAACAGCAG TTGTATGTGG TTATGTTAGT CTCAAGATAC 
1901 TTAATTTGTT GACCTTACTT CAGAAAAATT TTGTATGTAT TATATTTGTG 
1951 GGAAGGTAAA ATAATCATTT GAGATTTTTA TCAAATATGA AGATTAGTTA 
2001 TTTATGAAAA ACAAAGAAAT GTCTATTTTT CTTTGTTCCC AATTAATGTA 
2051 GATAAATTTT AAAATGCATT AAAGTAATGG TCCGGAAAAA AAAAA 



BLAST Results 



No BLAST result 



834 



WO 01/12659 



PCT/IBOO/01496 



Medline entries 



89098903: 

Human interleukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 



Peptide information for frame 2 



ORF from 368 bp to 679 bp; peptide length: 104 
Category: similarity to known protein 



1 METSHAHESN CKIKGYGVVQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLASTP hits 

Entry B32223 from database PIR: 
interleukin-7 precursor (clone 1) - human 

Score - 66, P - 7.0e-01, identities - 21/70, positives - 33/70 



Alert BLASTP hits for DKFZphtes3_35e21 , frame 2 

PIR:B32223 interleukin-7 precursor (clone 1) - human, N - 1, Score - 
66, P = 0.72 

TREMBL: PA DAL 1 1 gene: "dall"; P.abies dall mRNA, N - 2, Score - 59, P 
- 0.77 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N «• 1, Score = 
66, P - 0.79 

TREMBL: PRU7 672 6 1 gene: "PrMADS3"; product: "MADS-box protein"; Pinus 
radiata MADS-box protein (PrMADS3) mRNA, complete cds., N » 2, Score - 
59, P - 0.94 

>PIR:B32223 interleukin-7 precursor (clone 1) - human 
Length - 133 



Score - 66 (9.9 bits), Expect - 1.3e+00, P - 7.2e-01 
Identities - 21/68 (30*), Positives - 33/68 (48%) 

Query: 39 VSYVYSFRAVPFSLIL SNASLHSLGGK--DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L S+ + GK +S+ + + +L+ + E+ L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF— FKRHI 71 

Pedant information for DKFZphtes3_35e21, frame 2 

Report for DKFZphtes3_35e21 . 2 

( LENGTH] 104 

[MWl 11339.12 

[pi] 5.87 

[PROSITE} MYRISTYL 2 

[PROSITEJ PKC_PHOSPHO_SITE 1 

[ PROSITE) AS N_G LYCOS YLAT I ON 1 

[KW] Alpha_Beta 

SEQ METSHAHESNCKIKGYGWQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ SLGGKDSIQVGCLKELMGPLSELADQILGNLFNFYDFPSHILKW 
PRD cccccceeeccccccccccchhhhhhhhcccccccccccccccc 



Prosite for DKFZphtes3_35e21 . 2 



PS00001 
PS00005 
PS00008 
PS00008 



56->60 
44->47 
63->69 
89->95 



ASN_GLYCOSYLATION 
PKC PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 



PDOC00001 
PDOC00005 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35e21.2) 
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DKFZphtes3_35g6 



group; testes derived 

DKFZphtes3_35g6 encodes a novel 482 amino acid protein with high partiai similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



strong similarity to R27216_l 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map**" 15" 
Insert length: 3177 bp 

Poly A stretch at pos . 3167, polyadenylation signal at pos. 3148 



1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 
51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 
101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 
151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 
201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 
251 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 
301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 
351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 
401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 
451 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 
501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 
551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 
601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 
651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 
701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 
751 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 
801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 
851 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 
901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 
951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 
1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 
1051 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 
1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 
1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 
1201 CTACAGATTA TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 
1251 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 
1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 
1351 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 
1401 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 
1451 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 
1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 
1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 
1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 
1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 
1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 
1751 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 
1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 
1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 
1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 
1951 AAGAGATGGG TCAGTATTCC TACAGAATTC TTATTAACTC AAATAACTAA 
2001 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 
2051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 
2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 
2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 
2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 
2251 AACTGCACCA CTATACCTGT CTCTCTGTGT GGGGGACACT GCTGATGATT 
2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 
2351 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 
2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 
2451 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 
2501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 
2551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 
2601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 
2651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 
2701 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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2751 AAGATACAAA AAATTTTCAT CTAAAGTAAT ATTTCACTTT ATATTGTAAA 
2801 GAAGGTAGGT ATATTGGTGG CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 
2851 TTTTTCTATG GTAATGCTCT TACGGATATA AGCCTCAGTT AAATGGAATT 
2901 ATCTATGGGA TGTGTGGTTC TGGTTAACTA AAAATTAACC AGTAAACACT 
2951 CTGTAGTAAC CATTACAGAA AATACTTCTG CCTTAAAAAA TATGATATGC 
3001 CAGAGATGAG TTAGTGTTTC TTGACGTTGG AGACCTATAA ATGCCTCATC 
3051 TGTTGTACTG AACAATTGAA ACTGCATGCA GCCATAAAAG GGACAAGAAA 
3101 CAGAACTGTT TACTAACTTT GGGACATCCC CTGGAGTTTT TAAAAATAAA 
3151 TAAATATATA TATATATAAA AAAAAAA 



Entry G37753 from database EMBL: 
SHGC-63477 Human Homo sapiens STS genomic. 
Score - 1627, p - 3.0e-66, identities = 327/329 

Entry G37752 from database EMBL: 

SHGC-634 76 Human Homo sapiens STS genomic. 

Score - 1578, P - 6.2e-64, identities - 320/324 



ORF from 84 bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 



1 MASLGPAAAG EQASGAEAEP GPAGPPPPPS PSSLGPLLPL QREPLYNWQA 
51 TKASLKERFA FLFNSELLSD VRFVLGKGRG AAAAGGPQRI PAHRFVLAAG 
101 SAVFDAMFNG GMATTSAEIE LPDVEPAAFL ALLRFLYSDE VQIGPETVMT 
151 TLYTAKKYAV PALEAHCVEF LTKHLRADNA FMLLTQARLF DEPQLASLCL 
201 DTIDKSTMDA ISAEGFTDID IDTLCAVLER DTLSIRESRL FGAVVRWAEA 
251 ECQRQQLPVT FGNKQKVLGK ALSLIRFPLH TIEEFAAGPA QSGILSDREV 
301 VNLFLHFTVN PKPRVEYIDR PRCCLRGKEC C I NRFQQVES RWGYSGTSDR 
351 IRFTVNRRIS IVGFGLYGSI HGPTDYQVNI QIIEYEKKQT LGQNDTGFSC 
401 DGTANTFRVM FKEPIEILPN VCYTACATLK GPDSHYGTKG LKKVVHETPA 
451 ASKTVFFFFS SPGNNNGTSI EDGCjIPEIIF YT 



Entry AC005306_2 from database TREMBL: 

product: "R27216_l"; Homo sapiens chromosome 19, cosraid R27216, 
complete sequence. 

Score - 1298, P = 1.9e-l32, identities = 245/297, positives - 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: "F38H4.7"; Caenorhabditis elegans cosmid F38H4 

Score - 1237, p - 5.6e-126, identities - 248/446, positives - 322/446 

Entry AC004678 1 from database TREMBL: 

product: "R34094_l'*; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score =■ 555, P - 1.0e-53, identities - 112/137, positives - 123/137 



BLAST Results 



Medline entries 



No Medline entry 



Peptide information for frame 3 



BLAST P hits 



Alert BLASTP hits for DKF2phtes3_35g6, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_35g6, frame 3 



Report for DKFZphtes3_35g6 . 3 



[LENGTH] 

[MW] 

[pi] 



482 

52771.47 
5.79 
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[ HOMOL ] TREMBL:AC005306_2 product: "R27216_l"; Homo sapiens chromosome 19, cosmid 

R27216, complete sequence, le-142 

[BLOCKS} BL01075D Acetate and butyrate kinases family proteins 

{SUPFAM] POZ domain homology 3e-08 

(SUPFAM) A55R protein middle region homology 5e-06 

(SUPFAM) A55R protein 5e-06 

[SUPFAM] A55R protein carboxyl-terrainal homology 5e-06 

(PROSITEJ MYRISTYL 6 

[PROSITEJ CAMP_PHOSPHO SITE 2 

[PROSITE) CK2_PHOSPHO_SITE 9 

[PROSITEJ TYR_PHOSPHO SITE 1 

t PROS I TE J PKC_PHOS PHO~S XTC 7 

[PROSITEJ ASH_GLYCOSYLATION 2 

[KW] Alpha_Beta 

[KW] LOWCOMPLEXITY 11.20 % 

SEQ MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SEQ FL FNS E LLS DV RFV LGKGRGAAAAGG PQR I PAHRFVLAAGSAVFDAMFNGGMATTSAEI E 

SEG xxxxxxxxxxx 

PRD hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

SEQ LP DVE PAAFLALLRFLYS DEVQI GPETVMTT LYT AKK YAVPALEAHC VEFLTKH LRA DNA 

SEG 

PRD ecccchhhhhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

SEQ FMLLTQARLFDEPQLASLCLDTI DKSTMDAISAEGFTDI DI DTLCAVLERDTLS I RESRL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

SEQ FGAVVRWAEAECQRQQLPVTFGNKQKVLGKALSLIRFPLMTIEEFAAGPAQSGILSDREV 

SEG 

PRD hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceeecccccccccccccchhhhh 

SEQ VNLFLHFTVNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRIRFTVKRRIS 

SEG 

PRD hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

SEQ IVGFGLYGSIHGPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

SEG 

PRD eeeccccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

SEQ VCYTACATLKGPDSHYGTKGLKKVVHETPAASKTVFFFFSSPGNNNGTSIEDGQIPEIIF 

SEG xxxxxx 

PRD ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

SEQ YT 
SEG 

PRD CC 



Proaite for DKFZphtes3_3Sg6 . 3 



PS00001 


394->398 


PS00001 


466->470 


PS00004 


357->361 


PS00004 


387->391 


PS00005 


54->57 


PS00005 


154->157 


PS00005 


234->237 


PS00005 


296->299 


PS00005 


348->351 


PS00005 


406->409 


PSOO00S 


428->431 


PS00006 


14->18 


PS00006 


54->58 


PS00006 


115->119 


PS00006 


206->210 


PS00006 


217->221 


PS00006 


234->238 


PS00006 


281->285 


PS00006 


296->300 


PS00006 


468->472 


PS0O007 


430->437 


PS00008 


60->86 


PS00008 


110->116 


PS00008 


365->371 



ASN_GLYCOS YLAT I ON 
ASN_GLYCOS YLAT I ON 
CAMP_PHOSPHO_SITE 
CAMP_PHOS PHO_S I TE 
PKC_PHOSPHO SITE 
PKC_PH0SPH03SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2_PHOSPHO SITE 
CK2_PHOSPHO~SITE 
CK2_PHOS PHO_S I TE 
CK2 PHOSPHO SITE 
CK2~PHOS PHO~SI TE 
CK2 PHOSPHO~SITE 
CK2~PHOSPHO SITE 
CK2 PHOSPHORITE 
CK2~PHOSPHO_SITE 
TYR_PHOSPHO SITE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC000Q5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 

PDocooooe 

PDOC0000G 



839 
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PS00008 392->398 MYRISTYL PDOC00008 
PS00OO8 402->4O8 MYRISTYL POOC00008 
PS000O8 463->4 69 MYRISTYL . PDOC00008 



(No Pfam data available for DKFZphtes3_35g6.3) 
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DKFZphtes3_35kl6 



group: metabolism 

DKFZphtes3_35kl6 encodes a novel €6$ amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases . 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found in several CoA synthetases, such as acetate-CoA ligase (EC 6.2.1. U, long- 
chain- fatty-acid-CoA ligase (EC 6.2.1.3), 

bile acid-CoA ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 



similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2520 bp 

Poly A stretch at po3. 2510, polyadenylation signal at pos. 2490 



1 CAGATGTCCC AGCTCCAGTG CTGTGGAGCA TGGTTTCTGC ACACCTGGAA 
51 TGACTGGAAC CCCAAAGACT CAAGAAGGAG CTAAAGATCT TGAAGTAGAC 
101 ATGAATAAAA CAGAAGTTAC TCCCAGGCTG TGGACCACCT GTCGAGATGG 
151 AGAAGTCCTT CTGAGGCTAT CCAAACACGG ACCAGGCCAT GAGACCCCGA 
201 TGACCATCCC TGAATTTTTT CGAGAGTCAG TCAACCGATT TGGAACTTAT 
251 CCAGCCCTCG CATCCAAGAA TGGCAAAAAG TGGGAAATTC TGAATTTCAA 
301 CCAGTACTAT GAGGCTTGTC GGAAGGCTGC AAAATCCTTG ATCAAGCTGG 
351 GTTTGGAGCG TTTCCACGCA GTTGGTATCC TGGGGTTTAA CTCTGCAGAG 
401 TGGTTTATCA CTGCTGTTGG TGCCATCCTA GCCGGGGGTC TTTGTGTTGG 
451 TATTTATGCC ACCAACTCTG CCGAGGCTTG TCAATATGTC ATCACTCATG 
501 CCAAAGTGAA CATCTTGCTG GTTGAGAATG ATCAACAGTT ACAGAAAATC 
551 CTTTCGATTC CACAGAGCAG CCTAGAGCCC CTAAAAGCGA TCATCCAGTA 
601 CAGACTGCCA ATGAAGAAGA ACAACAACTT GTACTCTTGG GATGATTTCA 
651 TGGAACTTGG CAGAAGTATC CCTGACACCC AACTGGAGCA GGTCATCGAG 
701 AGCCAGAAGG CGAATCAATG CGCAGTGCTC ATCTACACTT CAGGGACCAC 
751 AGGCATACCC AAGGGAGTGA TGCTCAGTCA TGACAACATC ACGTGGATTG 
801 CAGGAGCAGT GACAAAGGAC TTTAAACTGA CAGACAAGCA TGAGACGGTG 
851 GTTAGCTACC TCCCACTCAG CCATATTGCA GCACAGATGA TGGACATCTG 
901 GGTACCCATA AAGATTGGGG CGCTCACATA CTTTGCTCAA GCAGATGCTC 
951 TCAAGGGCAC CTTGGTAAGT ACTCTAAAGG AGGTAAAACC TACTGTCTTC 
1001 ATTGGAGTGC CTCAAATTTG GGAGAAGATA CATGAGATGG TGAAGAAAAA 
1051 TAGTGCCAAG TCCATGGGCT TGAAGAAGAA GGCATTCGTG TGGGCAAGAA 
1101 ACATTGGCTT CAAGGTCAAC TCAAAAAAGA TGTTGGGGAA ATATAATACT 
1151 CCCGTGAGCT ACCGCATGGC TAAGACTCTC GTGTTCAGCA AAGTCAAGAC 
1201 ATCCCTTGGC TTGGATCACT GTCACTCTTT TATCAGTGGG ACTGCGCCCC 
1251 TCAACCAAGA GACTGCCGAG TTCTTTCTAA GCTTGGACAT ACCTATAGGC 
1301 GAGTTGTATG GGTTGAGTGA GAGCTCGGGA CCCCACACGA TATCCAACCA 
1351 GAATAACTAC AGGCTTCTAA GCTGTGGCAA GATCTTGACT GGGTGTAAGA 
1401 ATATGCTGTT CCAGCAGAAC AAGGATGGCA TTGGGGAGAT CTGCCTCTGG 
1451 GGTAGGCACA TCTTCATGGG CTATCTGGAA AGTGAGACTG AAACTACAGA 
1501 GGCCATCGAT GATGAAGGCT GGCTACACTC TGGGGATCTG GGCCAGCTGG 
1551 ACGGTCTGGG TTTCCTCTAT GTCACCGGCC ACATCAAAGA AATCCTTATC 
1601 ACTGCTGGTG GTGAAAATGT GCCCCCCATT CCTGTTGAGA CCTTGGTTAA 
1651 GAAGAAGATC CCCATCATCA GTAACGCCAT GTTAGTAGGA GATAAACTGA 
1701 AGTTTCTGAG CATGTTGCTG ACGCTGAAGT GTGAGATGAA TCAGATGAGC 
1751 GGAGAACCTC TGGACAAGCT GAACTTCGAG GCCATCAACT TCTGTCGGGG 
1801 TCTGGGCAGC CAGGCATCCA CCGTGACTGA GATGGTGAAG CAGCAAGACC 
1851 CCCTGGTCTA CAAGGCCATC CAGCAAGGCA TCAATGCTGT GAACCAGGAA 
1901 GCCATGAACA ATGCACAGAG GATTGAAAAG TGGGTCATCT TGGAGAAGGA 
1951 CTTTTCCATC TATGGTGGAG AGCTAGGTCC AATGATGAAA CTTAAGAGAC 
2001 ATTTTGTAGC CCAGAAATAC AAAAAACAAA TTGATCACAT GTACCACTGA 
2051 CTGCTTTGAT GGAGCTGCTC TCAGCTGTTC TGATGCCTTC AGCAGGAAGA 
2101 CCTCATTGCA ATAAGTGAAA TGCTGCTCTA GGTAGAAGCT CTCCCTGCTG 
2151 TTTTTAAGAA GCCACATTCC TCATTGGTCA GTTTCTTGAT TGTTCGTCTG 
2201 TTGGAGAGGT GCTCCCTAGA AGAACCTGCC ATACGTTTCA AAGCAATAAA 
2251 ATCACTGTAT ATCTTTCTAA GGACCTTCAA GTCATGACTC CAGGGAAGCC 
2301 TATTGGGAAG TCTACTAAAA ACTGCCTGAT TTACAAGAAA GACCTGAACT 



841 



WO 01/12659 



PCT/IBO0/01496 



2351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 

24 01 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 

2451 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 

2501 TTCAGGGTCC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 



1 MTGTPKTQEG AKDLEVDMNK TEVTPRLWTT CRDGEVLLRL SKHGPGHETP 

51 MTIPEFFRES VNRFGTYPAL ASKNGKKWEI LNFNQYiEAC RKAAKSL1KL 

101 GLERFHGVG1 LGFNSAEWFI TAVGAILAGG LCVGIYATNS AEACQYVITH 

151 AKVNILLVEN DQQLQKILSI PQSSLEPLKA IIQYRLPMKK NNNLYSWDDF 

201 MELGRSIPDT QLEQVIESQK ANQCAVLIYT SGTTGIPKGV MLSHDNITWI 

251 AGAVTKDFKL TDKHETWSY LPLSHIAAQM MDIWVPIKIG ALTYFAQADA 

301 LKGTLVSTLK EVKPTVFIGV PQIWEKIHEH VKKNSAKSMG LKKKAFVWAR 

351 NIGFKVNSKK MLGKYNTPVS YRMAKTLVFS KVKTSLGLDH CHSFISGTAP 

401 LNQETAEFFL SLDIPIGELY GLSESSGPHT ISNQNNYRLL SCGKILTGCK 

4 51 NMLFQQNKDG IGEICLWGRH IFMGYLESET ETTEAIDOEG WLHSGDLGQL 

501 DGLGFLYVTG HIKEIL1TAG GENVPPIPVE TLVKKKIPII SNAMLVGDKL 

551 KFLSMLLTLK CEMNQMSGEP LDKLNFEAIN FCRGLGSQAS TVTEMVKQQD 

601 PLVYKAIQQG INAVNQEAMN NAQRIEKWVI LEKDFSIYGG ELGPMMKLKR 

651 HFVAQKYKKQ IDHMYH 



BLAST P hits 



No BLAST P hits available 



Alert BLAST P hits for DKFZphtes3_35kl6, frame 2 

TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds . , H - 1, Score - 1641, P 
- 8.9e-169 

PIR:E70937 probable fadDlS - Mycobacterium tuberculosis (strain H37RV), 
N ■ 2, Score - 532, P - 3.6e-62 

PIR:H64041 long-chain-fatty-acid--CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20), N - 2, Score - 486, P - 6.5e-59 



>TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length - 634 

HSPs: 

Score - 1641 (246.2 bits). Expect - 8.9e-169, P = 8,9e-169 
Identities - 319/628 (50%), Positives » 440/628 (70%) 

Query: 38 LRLSKHGPGHETPMT! PEFFRESVNRFGTYPALASKNGKKWEI LNFNQYYEACRKAAKSL 97 

LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 
Sbjct: 2 LRIDPSCP--QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 59 

Query: 98 IKLGLERFHGVGILGFNSAEWFITAVGAILAGGLCVGIYATNSAEACQYVITHAKVNILL 157 

+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N++ + 

Sbjct: 60 LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGIYTTSSPEACQYIAYDCCANVIM 119 

Query: 158 VENDQQLQKILSIPQSSLEPLKAIIQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 216 

V+ +QL+KIL I L LKA++ Y+ P K N+T+ ++FMELG +P+ L+ +1 
Sbjct: 120 VDTQKQLEKILKI-MKQLPHLKAVVIYKEPPPNKMANVYTMEEFMELGNEVPEEALDAII 178 

Query: 217 ESQKANQCAVLIYTSGTTGIPKGVMLSHDNITWIA--GAVTKDFKLTD-KHETVVSYLPL 273 
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Sbjct; 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



+ +Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ □ + + + E WSYLPL 
179 DTQQPNQCCVLVYTSGTTGNPKGVMLSQDNITWTARYGSQAGDIRPAEVQQEVWSYLPL 238 

274 SHIAAQMMDIWVPIKIGALTYFAQADALKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKK 333 

SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 
239 SHIAAQIYDLWTGIQWGAQVCFAEPDALKGSLVNTLREVEPTSHMGVPRVWEKIMERIQE 298 

334 NSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKTLVFSKVKTSLGLDKCHS 393 

+A+S +++K +WA ++ + N G P + R+A LV +KV+ +LG C 

299 VAAQSGFIRRKMLLWAMSVTLEQNLT-CPGSDLKPFTTRLADYLVLAKVRQALGFAKCQK 357 

394 FISGTAPLNQETAEFFLSLDIPIGELYGLSESSGPHTISNQNNYRLLSCGKILTGCKNML 453 

G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 
358 NFYGAAPMMAETQHFFLGLNIRLYAGYGLSETSGPHFMSSPYNYRLYSSGKLVPGCRVKL 417 

4 54 FQQNKDGIGEICLWGRHIFMGYLESETETTEAIODEGWLHSGDI.GQLDGLGFLYVTGHIK 513 

Q+ +GIGEICLWGR IFHGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 
418 VNQDAEGIGEICLWGRTIFMGYLNMEDKTCEAIDEEGWLHTGDAGRLDADGFLYITGRLK 477 

514 EILITAGGENVPPIPVETLVKKKIPI ISNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDK 573 

E++ITAGGENVPP+P+E VK ++PI ISNAML+GD+ KFLSMLLTLKC ++ + + D 
478 ELIITAGGENVPPVPIEEAVKMELPIISNAMLIGDQRKFLSMLLTLKCTLDPDTSDQTDH 537 

574 LNFEAINFCRGLGSQASTVTEMVKQQDPLVYKAIQQGINAVNQEAMNNAQRIEKWVILEK 633 

L +A+ FC+ +GS+A+TV+E+++++0 VY+AI++GI VN A I+KW ILE+ 

538 LTEQAVEFCQRVGSRATTVSEI IEKKDEAVYQAIEEGIRRVNMNAAARPYHIQKWAILER 597 

634 D FS I YGG E LG PMMKLKRH FVAQK YKKQI DHMY 665 

DFSI GGELGP MKLKR V +KYK ID Y 
598 DFSISGGELGPTMKLKRLTVLEKYKGIIDSFY 629 



Pedant information Cor DKFZphtes3_35kl6, frame 2 



Report for DKFZphtes3_35)cl6. 2 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 



666 

74344.97 
8.67 

TREMBL:AB014531_ 



gene : 



mRNA for KIAA0631 protein, partial cds 



-KIAA0631" 
le-176 



product: "KIAA0631 protein"; Homo sapiens 



[FUNCATJ 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-29 
[FUNCAT] 
2e-23 
[FUNCAT] 
palmitylation, 
[BLOCKS] 
[SCOP] 
[EC] 
[EC] 
(EC) 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUP FAN] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 



i lipid metabolism [H. influenzae, HI0002) 2e-55 
08.10 peroxisomal transport [S. cerevisiae, YEROlSw] 2e-29 
30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

01.06.13 lipid and fatty-acid transport [S. cerevisiae, YER015w) 2e-29 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YEROlSw] 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YMR246w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YMR246w] 2e-23 

BL00455 

dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

1.13.12.7 Photinua-lucif erin 4-monooxygenase (ATP-hydrolysing) 9e-17 
6.2.1,3 Long-chain-fatty-acid — CoA ligase 4e-34 

5.1.1.11 Phenylalanine racemase (ATP-hydrolysing) 6e-08 

6.2.1.12 4-Coumarate--CoA ligase 8e-18 
duplication 6e-07 
phosphopantetheine 3e-12 
multifunctional enzyme 3e-06 

ligase 6e-08 

acid-thiol ligase 4e-34 

transmembrane protein 5e-22 

monooxygenase 9e-17 

hydrolase 4e-34 

peroxisome 9e-15 

antibiotic biosynthesis 3e-12 

isomerase 6e-08 

flavonoid biosynthesis le-17 

magnesium 9e-15 

ATP 5e-22 

oxidoreductase 9e-17 
liver 2e-31 

alpha-aminoadipyl-cysteinyl-valine synthetase 3e-07 
human long-chain-fatty-acid — CoA ligase 4e-34 
gramicidin S synthetase I 6e-08 
peptide synthetase ppsE 7e-06 

gramicidin S synthetase I repeat homology 3e-l2 
peptide synthetase ppsD 2e-07 
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[KW] 
[KW] 
[KW] 



[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
(SUPFAM] 
(SUPFAM] 
(SUPFAM] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
( PROSITE] 
( PROSITE] 
[PROSITE] 
[PROSITE] 
[PFAM] 



probable acyl-CoA ligase medium chain 2e-09 

acetate — CoA ligase 8e-10 

acetate — CoA ligase homology 4e-54 

surfactin synthetase 3e-12 

4-coumarate — CoA ligase 8e-lB 

short-chain alcohol dehydrogenase homology 6e-07 

acyl carrier protein homology 2e-29 

MYRISTYL 12 

AMP BINDING 1 

AMIDATION 1 

CAMP_PHOSPHO SITE 1 

CK2_PH0SPHO SITE 9 

TYR_PH0SPHO~SITE 3 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOS YLAT I ON 2 

AMP-binding enzymes 

Irregular 

3D 

LOW_COHPLEXITY 1.80 % 



SEQ MTGT PKTQEGAKDLE V DMN KTEVT PRLWTTCRDGEVLLRLSKHGPGHET PHT I PEFFRES 

SEG 

llci- 

SEQ VNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSLIKLGLERFHGVGILGFNSAEWFI 

SEG 

llci- 

SEQ TA VGA I LAGGLC VG I YATN S AEACQYV I THAKVN I L LVEN DQQLQK I LS I PQS S LE PLKA 

SEG 

llci- 

SEQ IIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIYTSGTTGIPKGV 

SEG 

llci- 

S EQ MLS HDN I TW I AGAVT K DFKLT DKH ETVVSYLPLS H I AAQMMDI WVPIKIGALTY FAQADA 

SEG 

llci- 

SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG 

llci- 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFLSLOIPIGELY 

SEG 

1 lei - TTTTCEEETTTTCCCHHHHHHHHHHCCCCBCEE 

SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHIFMGYLESET 

SEG 

llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEETTTTCCEETTTHH 

SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPII 

SEG xxxxxxxxxxxx 

llci- HHHHHBTTTTCEEEEEEEEETTTTCEEE ECEEETTEEECHHHHHHHHHHT-TTE 

SEQ SNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDKLNFEAINFCRGLrGSQASTVTEMVKQQD 

SEG 

llci- EEEEEEE 

SEQ PLVYKAI QQG I N AVNQEAMNNAQRI EKWV ILEKDFSI YGGELG PKMKLKRH FVAQK YKKQ 

SEG 

llci- 

SEQ IDHMYH 

SEG 

llci- 



Prosite for DKFZphtes3_35kl6.2 



? 



PS00001 
PS00001 
PS00004 
PS00OOS 
PS00005 
PS00005 
PS00005 
PSOOOOS 



21B->221 
261->264 



246->2S0 
332->336 



4->7 
24->27 
30->33 



19->23 



ASN_GLYCOSYLATION 
AS N_GL YCOS YLAT I ON 
CAMP_PHOSPHO SITE 
PKC_PHOSPHO SITE 
PKC_PHOSPHO~SITE 
PKC_PHOSPH0_SITE 
PKC PHOSPHO_SITE 
PKC~PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


308- 


>311 


PKC PHOSPHO 


SITE 


r m*A* v U u \J D 


PS00005 


335- 


■>338 


PKC^PHOSPHO" 


"SITE 


PDOC00005 


PS0OOO5 


358- 


■>361 


PKC~PHOSPHCf 


"site 


PDOC00005 


PS0OOOS 


370- 


■>373 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


558- 


■>561 


PKC PHOSPHO" 


"site 


PDOC00005 


PS0O006 


30->34 


CK2~PHOSPHO _ 


"site 


PDOC00006 


PS00006 


52->56 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS0OO06 


173- 


■>177 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS0O006 


196- 


■>200 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


206- 


•>210 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


210- 


■>214 


CK2 PHOSPHO~SITE 


PDOC00006 


PSOO006 


308- 


>312 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


478- 


•>462 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00006 


591- 


■>595 


CK2"PHOSPHO~SITE 


PDOC00006 


PS00007 


659- 


>666 


TYR PHOSPHO~SITE 


PDOC00007 


PS00007 


658- 


•>666 


TYR PHOSPHO 


SITE 


PDOC00007 


PS0O007 


597- 


•>605 


tyr"phospho" 


^SITE 


PDOC00007 


PS00008 




3->9 


MYRISTYL 




PDOC00008 


PS0O008 


65->71 


MYRISTYL 




PDOC00008 


PS00008 


124- 


>130 


MYRISTYL 




PDOC00008 


PS0O008 


130- 


>136 


MYRISTYL 




PDOC00008 


PS00008 


134- 


>140 


MYRISTYL 




PDOC00008 


PS0O008 


235- 


>241 


MYRISTYL 




POOC00008 


PS00008 


239- 


>245 


MYRISTYL 




PDOC00008 


PS00008 


303- 


■>309 


MYRISTYL 




POOC00008 


PS00008 


387- 


>393 


MYRISTYL 




PDOC00008 


PS00008 


421- 


>427 


MYRISTYL 




PDOC00008 


PS00008 


498- 


■>504 


MYRISTYL 




pDocooooe 


PS00008 


586- 


>592 


MYRISTYL 




pDocooooe 


PS00009 


74 


->78 


AMI DAT I ON 




PDOC00009 


PS004S5 


227- 


>239 


AMP BINDING 




PDOC00427 



Pfam for DKF2pht«33_351tl6. 2 



HMM_NAME AMP-binding enzymes 

HMM *TYRELNERANRLARHLR3ekGIrPG0iVgIMMDRSMWMIVaMLGIWKAG 
+ + +E +A L+ +G VGI+ +S + ++ G + AG 

Query 82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWriTAVGAILAG 129 

HMM GAYVPIDPeYPdERIqYMLEDSGArLLITQrh HmqRI PdemwwvdH 

G +V I *E QY+ + ++ +• +L+++ + ♦ IP++++ + 

Query 130 GLCVGIYATNSAEACQYVITHAKVNILLVENDOQLQKILSIPQSSLEPLK 179 

HMM IiviDWe WddlWWHedeeNpqpWvdPeDLAYIIY 

+ I + + + + ++++ + E ++ ++++ A +IY 

Query 180 AIIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIY 229 

HMM TSGTTGK PKGVM IEHrNI vNy cqWMnWRYgM t e e DDRI LW Ft S Dp YW FDa 

TSGTTG PKGVM++H NI+ + +++ +T+ +++ ♦ + ++ A 
Query 230 TSGTTG I PKGVMLS H DN I TW I AGAVT KDFKLTDKHET WS YLP - LSH I AA 278 

HMM SVWDMFWpLLnGaTLYIpPeEtRrDPe rWWqYIqRHglTWWylTPSMFRM 

+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

Query 279 QKMDIWVPIKIGALTYFAQADAL--KGTLVSTLKEVKPTVFIGVPQIWEK 326 

HMM LMpd 

+ + 

Query 327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 376 

HMM psLRhVMFgGEpLsPehWdWWRkrfgf kgRIINMYWPT 

+ + + +++G PL++E+++ ++ + ++I Y+ +■ 
Query 377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD— IPIGELYGLS 423 

HMM ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQIQPiGViGE 
E++ T+ + + R +++G+ + + + + +N G IGE 

Query 424 ESSGPHTISNQNN — Y RLLSCGKILTGCKNMLFQQN KDG-IGE 463 

HMM LYIgGWPGVARGYWNRPELTEERFipNPFWPGEYRrGWNrRMYRTGDLAR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
Query 4 64 ICLWG-RHIFMGYLESETETTEAIDDEGW LHSGDLGQ 499 

HMM WIPDGnlEYLGRID. OQVKIRGYRIELGEIEhqLr .qHPglqEAW* 

+ G+++ G I + G++> + + E+ + ++P 1+ A 
Query 500 LDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPIISNAML 545 
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DKFZphtes3_3Sk24 



group: transmembrane protein 

DKFZphtes3_35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 
The novel protein contains 5 transmembrane regions. 

Ho informative BLAST results; No predictive prosite, pfara or SCOP rootife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown ; 

membrane regions: 5 

Summary DKFZphtes3_35k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and C.elegans, specific for 
mammalians? 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2706 bp 

Poly A stretch at pos. 2696, polyadenylation signal at pos. 2675 



1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 
51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 
101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 
151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 
201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 
251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 
301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 
351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 
401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 
451 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 
501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 
551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 
601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 
651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 
701 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 
751 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 
801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 
851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 
901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 
951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 
1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 
1051 GAATATGGGC AATATATCGG CCCGGGGCAG AAGATATATA CAGTGAAAGA 
1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 
1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAACATATGT TGAGGGAGAC 
1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 
1251 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 
1301 GGTTCTTTGG ACGATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 
1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 
1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 
14 51 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 
1551 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA GATAACACAA AAAGCAACCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
1701 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
1751 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2351 AACAAAGGTT AAGAGACACA 
2401 TACACAAAGT CCCAGACAAC 
2451 CAGCACATCC CACCATTTAC 
2501 TCTGGATAGT GAAAATTGAA 
2551 CCTCAAAAAA TCATGCAGCG 
2601 AAAGAATTTG TTTAATGTCT 
2651 TTTTAAGAAC TAAATATTGC 
2701 AAAAAA 



GTTGGGCGAA CTCTCAAATT TATTGGCATT 
CAAGGAACTG AAGTTTTCAT CATATGAGAG 
AATATTCGTA TATCTTTCTG CAAATATGGC 
AAACATATGC CAACCCTGAG CAAGGGAACT 
GAACCTTGTC AGGTAGAGAA GCCGTGCATG 
TGTTTTGCGT ATGTGTTTTT TGTTTTTGTT 
ACATTAATAA ATAAGAATTA TACAGCAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 



1 MGKDFRYYFQ HPWSRMIVAY LVIFFNFLIF AEDPVSHSQT EANVIVVGNC 
51 FSFVTNKYPR GVGWRI LKVL LWLLAILTGL IAGKFLFHQR LFGQLLRLKM 
101 FREDHGSWMT MFFSTILFLF IFSHIYNTIL LMDGNMGAYI ITDYMGIRNE 
151 S FMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SVWLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 
251 LLtVMQDWEF PHFMGDVOVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 
351 DLNRTKLSWE WRSNHTNPRT HKTYVEGDWF LHSRFIGASL DVKCLAFVPS 
401 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLNESTSATE 
501 ADQDPTTSKS TPTN 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35k24, frame 1 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_35k24, frame 1 



Report for DKFZphtes3_35k24 . 1 



[LENGTH] 514 

[HW] 60185.03 

{pi] 8.67 

(PROSITE] MYRISTYL 5 

tPROSITE] CAMP_PHOSPHO_SITE 1 

{ PROSITE] CK2 PHOSPHO SITE 8 

[PROSITE] TYR~PHOSPHO~SITE 1 

[PROSITE) PKC PHOSPHO~SITE 7 

[ PROSITE] ASN~GL YCOS YLAT I ON 6 

[KW] SIGNAL_PEPTIDE 32 

[KW] TRANSMEMBRANE 5 

[ KH) LOW_COMPLEXITY 15.37 % 



SEQ MGKD FR Y YFQHPWSRM I VAYLVIFFNFL I FAEDPVSHSQT EANVIVVGNC FSFVTNKYPR 

SEG 

PRD cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRI LKVLLWLLAILTGLIAGKFLFHQRLFGOLLRLKMFREDHGSWMTMFFST I LFLF 

SEG xxxxxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SEQ IFSHiyNTILLMDGNMGAYIITDYMGIRNESFMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

SEG xxx 

PRD hhhhhhhhhhccccccceeceecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 

MEM MMMMMMMMMMMM 

SEQ YPDWGKSARAFWKKGNVRITLFWTVLFTLTSWVLVITTDWISWDKLNRGFLPSOEVSRA 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 

MEM MHMMMMMMMMMMMMMMM M 

SEQ FLASFILVFDLLIVMQDWEFPHFMGDVDVNLPGLHTPHMQFKIPFFQKIFKEEYRIHITG 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ KWFNYGIIFLVLILDLNMWKNQIFYKPHEYGQYIGPGQKIYTVKDSESLKDLNRTKLSWE 

SEG 

PRD ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 

MEM 

SEQ WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

SEG xxxxxxxxxxxxxx . . . 

PRD hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ EPRMEHQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

SEG 

PRD cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

MEM 

SEQ S H LTS ENLS SQLN ESTSAT EADQDPTTS KS T PTN 

SEG 

PRD cccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_35lc24 . 1 



PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS0O0O5 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 

psooooe 

PS00007 
PS00008 

psoooos 
psooooa 

PS00003 
PS00008 



149->1S3 
353->357 
364->368 
371->375 
4B7->491 
493->497 
435->439 

55->58 
187->190 
299->302 
342->345 
348->351 
370->373 
507->510 

38->42 
342->346 
348->352 
373->377 
438->442 
456->460 
497->501 
499->503 
326->334 

48->54 

79->85 
106->112 
134->140 
I59->16S 



ASH GLYCOSYLATION 

AS N~GL YCOS YLATI ON 

ASN_GLYCOSYLATION 

ASN_GL YCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC~PKOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2 PHOSPHO SITE 

CK2~PH0SPH02SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCQ0O06 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_35k24 . 1 ) 
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DKFZphtes3_35nl2 



group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP, ATP 
carrier T (ANT) proteins. 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP/ATP translocator, or adenine nucleotide translocator (ANT), a protein most 
abundant in mitochondria. In its functional state, it is a homodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria . 



strong similarity to ADP/ATP carrier proteins 

EST hits to mouse and drosophila 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1803 bp 

Poly A stretch at pos . 1793, polyadenylation signal at pos. 1772 



1 AGCGTCCCAA GAGCCACTTT CTCGCCAGTA CGATGCTGCA GCGGTTTTCC 

51 GGTTTTCCGC TTCCCTTCAT CGTAGCTCCC GTACTCATTT TTAGCCACTG 

101 CTGCCGGTTT TTATATCCTT CTCCATCATG CATCGTGAGC CTGCGAAAAA 

151 GAAGCCAGAA AAGCGGCTGT TTGACGCCTC ATCCTTCGGG AAGGACCTTC 

201 TGGCCGGCGG AGTCGCGGCA GCTGTGTCCA AGACAGCGGT GGCGCCCATC 

251 GAGCGGGTGA AGCTGCTGCT GCAGGTGCAG GCGTCGTCGA AGCAGATCAG 

301 CCCCGAGGCG CGGTACAAAG GCATGGTGGA CTGCCTGGTG CGGATTCCTC 

351 GCGAGCAGGG TTTCTTCAGT TTTTGGCGTG GCAATTTGGC AAATGTTATT 

401 CGGTATTTTC CAACACAAGC TCTAAACTTT GCTTTTAAGG ACAAATACAA 

4 51 GCAGCTATTC ATGTCTGGAG TTAATAAAGA AAAACAGTTC TGGAGGTGGT 

501 TTTTGGCAAA CCTGGCTTCT GGTGGAGCTG CTGGGGCAAC ATCCTTATGT 

551 GTAGTATATC CTCTAGATTT TGCCCGAACC CGATTAGGTG TCGATATTGG 

601 AAAAGGTCCT GAGGAGCGAC AATTCAAGGG TTTAGGTGAC TGTATTATGA 

651 AAATAGCAAA ATCAGATGGA ATTGCTGGTT TATACCAAGG GTTTGGTGTT 

701 TCAGTACAGG GCATCATTGT GTACCGAGCC TCTTATTTTG GAGCTTATGA 

751 CACAGTTAAG GGTTTATTAC CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 

801 CCTTTTTCAT TGCTCAAGTT GTGACTACAT GCTCTGGAAT ACTTTCTTAT 

851 CCCTTTGACA CAGTTAGAAG ACGTATGATG ATGCAGAGTG GTGAGGCTAA 

901 ACGGCAATAT AAAGGAACCT TAGACTGCTT TGTGAAGATA TACCAACATG 

951 AAGGAATCAG TTCCTTTTTT CGTGGCGCCT TCTCCAATGT TCTTCGCGGT 

1001 ACAGGGGGTG CTTTGGTGTT GGTATTATAT GATAAAATTA AAGAATTCTT 

1051 TCATATTGAT ATTGGTGGTA GGTAATCGGG AGAGTAAATT AAGAAATAAC 

1101 ATGGATTTAA CTTGTTAAAC ATACAAATTA CATAGCTGCC ATTTGCATAC 

1151 ATTTTGATAG TGTTATTGTC TGTATTTTGT TAAAGTGCTA GTTCTGCAAT 

1201 AAA GC AT AC A TTTTTTCAAG AATTTAAATA CTAAAAATCA GATAAATGTG 

1251 GATTTTCCTC CCACTTAGAC TCAAACACAT TTTAGTGTGA TATTTCATTT 

1301 ATTATAGGTA GTATATTTTA ATTTGTTAGT TTAAAATTCT TTTTATGATT 

1351 AAAAATTAAT CATATAATCC TAGATTAATG CTGAAATCTA GGAAATGAAA 

1401 GTAGCGTCTT TTAAATTGCT ATTCATTTAA TATACCTGTT TTCCCATCTT 

14 51 TTGAAGTCAT ATGGTATGAC ATATTTCTTA AAAGCTTATC AATAGATGTC 

1501 ATCATATGTG TAGGCAGAAA TAAGCTTTGT TCTATATCTC TTCTAAGACA 

1551 GTTGTTATTA CTGTGTATAA TATTTACAGT ATCAGCCTTT GATTATAGAT 

1601 GTGATCATTT AAAATTTGAT AATGACTTTA GTGACATTAT AAAACTGAAA 

1651 CTGGAAAATA AAATGGCTTA TCTGCTGATG TTTATCTTTA AAATAAATAA 

1701 AATCTTGCTA GTGTGAATAT ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 

17 51 TTAGTTTGTA TATTTTGTTG ACAATAAAGG AAGCTTAACT GTTAAAAAAA 

1801 AAA 



BLAST Results 



No BLAST result 



Medline entries 



96289608: 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 



ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MI TOCH_CARRI ER (40-50) 

MITOCH CARRIER (14 5-155) 

MITOCH_CARRIER (242-252) 



1 MHREPAKKKA EKRLFDASSF GKDLLAGGVA AAVSKTAVAP IERVKLLLQV 
51 QASSKQISPE ARYKGMVDCL VRIPREQGFF SFWRGNLANV IRYFPTQALN 
101 FAFKDKYKQL FMSGVNKEKQ FWRWFLANLA SGGAAGATSL CWYPLDFAR 
151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 
201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ VVTTCSGILS YPFDTVRRRM 
251 MMQSGEAKRQ YKGTLDCFVK IYQHEG1SSF FRGAFSHVLR GTGGALVLVL 
301 YDKIKEFFH1 DIGGR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35ni2, frame 2 

PIR:S37210 ADP, ATP carrier protein Tl - mouse, N « 1, Score - 1127, P - 
2.7e-114 

PIR:A44778 ADP, ATP carrier protein Tl - human, N - 1, Score - 112S, P - 
4.4e-114 

TREMBL: DMADPATPT_2 product: "ADP/ATP translocase"; Drosophila 
melanogaster gene encoding ADP/ATP translocase, M - 1, Score - 1124, P 
- 5.6e-114 

PIR:XWBO ADP, ATP carrier protein Tl - bovine, N - 1, Score - 1121, P - 
1.2S-U3 



>PIR:S37210 ADP, ATP carrier protein Tl - mouse 
Length - 298 

KSPs: 



Score - 1127 (169.1 bits), Expect - 2.7e-U4, P » 2.7e-114 
Identities - 214/293 (73%), Positives - 248/293 <84\> 



Query: 


17 


ASS FGKDLLAGGVAAAVS KT AVAP I ERVKLLLQVQAS SKQI S PEARY KGMVDCLVRI P RE 


76 




A SF KD LAGG+AAAVSKTAVAPIERVKLLLQVQ +SKQIS E +YKG++DC+VRIP+E 




Sbjct: 


5 


ALS FLKDFLAGGI AAAVS KT AVAP I ERVKLLLQVQHASKQI SAEKQYKGI I DCVVRI PKE 


64 


Query: 


77 


QGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQFWRW FLAN LAS GGAAG 


136 




QCF SFWRGNLANV I RYFPTQALNFAFKDKYKQ+F+ GV+ + KQFWR+F N LAS GGAAG 




Sbjct: 


65 


QGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQI FLGG V DRH K Q FWRY FA GN LAS GGAAG 


124 


Query: 


137 


ATSLCWYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSDGIAGLYQGFGVSVQGI 


196 




ATSLC VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 




Sbjct: 


125 


ATSLCFVYPLDFARTRLAADVGKGSSQREFNGLGDCLTKIFKSDGLKGLYQGFSVSVQGI 


184 


Query: 


197 


IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQVVTTCSGILSYPFDTVRRRMMMQSGE 


256 




I+YRA+YFG YDT KG+LP PK +VS+ IAQ VT +G++S YPFDTVRRRMMMQSG 




Sbjct: 


185 


IIYRAAYFGVYDTAKGMLPDPKNVHI IVSHMIAQSVTAVAGLVSYPFDTV RRRMMMQS G R 24 4 


Query: 


257 


--AKRQYKGTLDCFVKIYQHEGISSFFRGAFSHVLRGTGGALVLVLYDKIKEF 307 






A Y GTLDC+ KJ + EG ++FF+GA+SNVLRG GGA VLVLYD+IK++ 




Sbjct : 


245 


KGAD I MYTGT LDCW RK I AK DEGAN AFFKGAWSN VLRGMGGAFVL VLYDE IKKY 297 





Pedant information for DKFZphtes3_35nl2, frame 2 



Report for DKFZphtes3_35nl2 .2 

(LENGTH) 315 
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WO 01/12659 



PCT/IB00/01496 



(MW] 

[pD 

[HOMOLJ 

(FUNCAT] 

[FUNCAT) 

[FUNCAT) 

[FUNCAT) 

[FUNCAT) 

cerevisiae, 

[FUNCAT] 

[ FUNCAT ] 

[ FUNCAT 1 

(FUNCAT] 

le-13 

[FUNCATJ 

[FUNCAT] 

6e-12 

[FUNCATJ 

[FUNCATJ 

[FUNCAT] 

[ FUNCAT J 

[FUNCAT] 

[FUNCAT] 

[BLOCKS) 

[BLOCKS) 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

(PIRKW) 

[ PIRKW) 

[PXRKW] 

(PIRKW) 

(PIRKW) 

[ PIRKW] 

[PIRKW] 

{PIRKW] 

(PIRKW) 

(PIRKW) 

[SUPFAM1 

[SUPFAM1 

(SUPFAM) 

[SUPFAM) 

(SUPFAM) 

IPROSITE] 

( PFAM) 

IKWJ 

[KWJ 



35022 
9.91 
PIR:S: 
07.16 
08.04 
30.16 
01.03 
01-07 
YIL006w] 
07.99 
01.05 
07.07 
07.04 



.03 

37210 AOP.ATP carrier protein Tl - mouse le-115 
purine and pyrimidine transporters IS. cerevisiae, YBL030c) 2e-72 

mitochondrial transport (S. cerevisiae, YBL030c) 2e-72 

mitochondrial organization (S. cerevisiae, YBL030c) 2e-72 

.19 nucleotide transport [S. cerevisiae, YBL030c) 2e-72 

10 transport of vitamins, cof actors, and prosthetic groups {S. 
2e-14 

other transport facilitators [S. cerevisiae, YIL006w] 2e-14 
.07 carbohydrate transport (S. cerevisiae, YPR021c] 5e-14 

sugar and carbohydrate transporters [S. cerevisiae, YPR021C) 5e-14 
07 anion transporters <cl, so4, po4, etc.) (S. cerevisiae, YKL120w] 



02.13 respiration [S. cerevisiae, YBR192w) 4e-13 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 



[S. cerevisiae, YLR348c] 4e-l0 
cerevisiae, YLR348c) 4e-10 

[S. cerevisiae, YOR130c) le-06 

[S. cerevisiae, YOR130cJ le-06 
cerevisiae, YPR128c] 2e-06 

YKR052cJ 2e-06 



13.04 homeostasis of other ions 
01.04.07 phosphate transport [S 
01.01.07 amino-acid transport 
07.10 amino-acid transporters 
99 unclassified proteins (S 

04.05.03 mrna processing (splicing) [S. cerevisiae 
BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication le-115 
phosphate transport 2e-09 
heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-08 
acetylated amino end le-115 
adipose tissue Se-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR045w 3e-07 
ADP,ATP carrier protein le-115 
Btl protein 2e-14 

ADP,ATP carrier protein repeat homology le-115 
probable carrier protein YPR021c le-12 
MITOCH_CARRIER 3 
Mitochondrial carrier proteins 
TRANSMEMBRANE 2 
LOW COMPLEXITY 4.76 * 



SEQ MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPI ERVKLLLQVQASSKQISPE 

SEG 

PRD ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhh 

MEM 

S EQ ARY KGMVOC LVRI PREQGFFS FWRGNLAN V I RY FPTQALN FAFKDK YKQLFMSGVNKEKQ 

SEG 

PRD hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 

MEM 

SEQ FWRWFLANLASGGAAGATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

SEG xxxxxxxxxxxxxxx 

PRD eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

MEM 

SEQ GIAGLYQGFGVSVQGIIVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQWTTCSGILS 



PRD cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhhhheeeeec 

MEM KMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMKMMMMMM 

SEQ YPFDTVRRRMMMQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVL 

SEG 

PRD cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 

MEM MMMMMMMMMMM 

SEQ YDKIKEFFHIDIGGR 

SEG 

PRD hhhhhhheeeecccc 

MEM 
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Prosite for DKFZphtes3_3Snl2 . 2 



PS00215 
PSO0215 
PS00215 



40->50 
145-M55 
242->252 



mitoch carrier 
mitoch~carrier 
mitoch carrier 



PDOC001B9 
PDOC00189 
PDOC00189 



Pfam for DKFZphtes3_35nl2 . 2 



HMM_NAME Mitochondrial carrier proteins 

HMM * p Fwkd FLAGG I AGWMe HTvMFP I Dt I KT RMQIQgEMpM . . ahpRYkGMI 

+F+KD+LAGG+A+++++T+++PI+++K+++Q+Q +++ RYKGM+ 
Query 19 SFGKOLLACGVAAAVSKTAVAPIERVKLLLQVQASSKQISPEARYKGMV 67 

HMM dCFRwI wkNEGWRGLWRGLgANvIRYI PqWal RFGFYEFMKeMFiDyf ge 

DC+ +I++++G++++WRG++ANVIRY+P++A++F+F++ +K + F + +++ 
Query 68 DCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 117 

HMM ddn y WmWFwmn YMaGsmAGEwi s V I i t Y PMWvVKTRLQa DqkHphsQp . R 

++W+WF* N+++G++AG ++S+ ++YP+++++TRL D +++++ R 
Query 118 EKQFWRWFLANLASGGAAG-ATSLCWYPLDFARTRLGVD--IGKGPEER 164 

HMM h YNGvWNcW r k I Y Re EGg FkGL YRGW t PTWMRM I PYqmiYFf vYEtLKeW 

+++G+ +C KI +++G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
Query 165 QFKGLGDCIKKIAKSDG- I AGLYQGFGVSVQGI IVYRASYFGAYOTVKGL 213 

HMM lynYtgYnPgprelCMddsPwWhWilgWmlAGMiaWivSYPfDVVRTRMM 
L +++ + ++++++I++ + * ++++J+SYPFD+VR+RMM 

Query 214 LP KPK-- KTPFLVSFFIAQVVT-TCSGILSYPFDTVRRRMM 2 51 

HMM Mdsm. edhkYqSmlDCWMql YKnEGFkGFWKGFWPRIMRiMPWtAIMFml 

M+S+ ++++Y+++LDC+++1Y++EG+ +F++G+ +++R+ ++A+++++ 
Query 252 MQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGT-GGALVLVL 300 

HKM YEqMKwFL* 
Y+ +K+F+ 

Query 301 YDKIKEFF 308 



852 



WO 01/12659 



PCT/IBOO/01496 



DKFZphtes3_35n24 



group: testes derived 

DKFZphtes3 35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite Ig ( immunoglobulin) -MHC pattern. This pattern represents 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (llg domain!). Thus, the novel protein is a new member of the Ig-superfamily. 
No informative blast results; No predictive prosite, pfam or SCOP raotife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locu s : un known 

Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos. 1560 



1 CGATCGTCAC GTGACGCCGG GGTTCAGCGT ATCCTTGCTG GGCAACCGTC 

51 TTAGAGACCA GCACTGCTGG CTGCACCATG AATGTGATCT ACCCACTGGC 

101 AGTCCCCAAG GGGCGCAGAC TCTGCTGTGA GGTGTGCGAA GCCCCAGCCG 

151 AGCGGGTGTG CGCGGCCTGC ACAGTCACTT ATTACTGTGG GGTGGTACAT 

201 CAGAAGGCTG ACTGGGACAG CATCCATGAG AAAATATGTC AGCTCTTGAT 

2S1 TCCACTGCGC ACTTCCATGC CCTTCTACAA TTCAGAGGAA GAACGGCAGC 

301 ATGGCCTGCA GCAGCTGCAG CAGCGGCAGA AGTATTTGAT TGAATTCTGC 

351 TACACCATAG CCCAGAAATA CCTCTTTGAA GGGAAACACG AAGATGCTGT 

401 ACCAGCAGCT TTGCAGTCCC TTCGCTTCCG TGTGAAGCTG TATGGCCTGA 

451 GCTCCGTAGA GCTTGTGCCT GCTTACCCGC TGTTGGCCGA GGCCAGCCTT 

501 GGTCTGGGCC GAATCGTTCA GGCTGAAGAA TATCTATTCC AAGCCCAGTG 

551 GACAGTCCTC AAATCAACTG ACTGTAGTAA TGCCACCCAC TCTTTACTGC 

601 ATCGGAATCT GGGACTTCTC TATATAGCTA AGAAAAACTA TGAAGAGGCC 

651 CGTTATCATC TGGCCAATGA TATTTATTTT GCCAGTTGTG CATTTGGAAC 

701 AGAGGACATT AGGACTTCAG GAGGCTACTT CCACCTGGCT AATATATTCT 

751 ATGACCTTAA AAAGTTGGAC CTGGCAGACA CATTGTACAC CAAGGTCTCT 

801 GAGATCTGGC ATGCATATTT GAACAATCAC TATCAAGTCC TCTCACAGGC 

851 TCACATCCAA CAAATGGATT TACTGGGCAA ACTATTTGAG AATGACACTG 

901 GCTTGGATGA AGCCCAAGAA GCAGAAGCCA TTCGCATCCT GACTTCAATC 

951 TTGAACATTC GAGAATCTAC ATCTGACAAA GCCCCCCAAA AAACCATCTT 

1001 TGTTCTGAAG ATCCTGGTCA TGCTTTACTA CCTGATGATG AATTCTTCAA 

1051 AGGCACAGGA ATATGGCATG AGGGCCCTCA GTCTAGCCAA AGAACAACAG 

1101 CTTGATGTCC ATGAGCAAAG CACCATTCAA GAGTTATTAA GTCTCATTTC 

1151 AACTGAAGAC CATCCCATTA CTTAGTGACC CATGAGCTCT GCATCAAGGG 

1201 TTATTCCAGG GGCTACTGAA GATCTAATAT ATTCCAGCCT TGCACAACTG 

1251 CTTTGAGGTA CTGTAGACTG CTGAAGTTTC CACCCTCTTC CCCTGGGATT 

1301 GCACACATAG CTGTTATTTT TTTCTTACAC AGCATATTAA GGGAATATAA 

1351 AGCTTTAGGC ATAGAAATCA CTAAAAACTG TGTTTGTCAT GACCTTTGTA 

1401 CTTGATTTAT CATGACTTTG TATGACTGAG TAATATGTAG TCAGATCACT 

1451 AATATGGTAT TTGTAATTAA ACTACAAATA GTTTGTCATT TCCCAGAAGT 

1501 CTTCCAACGA TGCATGTTTC ATACACTTTT GCTAAAGGAG GGGTAAAGGA 

1551 GGGGGTAGGG AATAAAGCTA TATTGGAACA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



Mo Medline entry 



Peptide information for frame 3 



ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosit* motifs: IG_MHC {35-42> 



1 MNVIYPLAVP KGRRLCCEVC EAPAERVCAA CTVTYYCGW HQKADWDS I H 

51 EKICQLLIPL RTSMPFYNSE EERQHGLQQL QQRQKYLIEF CYTIAQKYLF 

101 EGKHEDAVPA ALQSLRFRVK LYGLSSVELV PAYPLLAEAS LGLGRIVQAE 

151 EYLFQAQWTV LKSTDCSNAT HSLLHRNLGL LYIAKKNYEE ARYHLANDIY 

201 FA5CAFGTED IRTSGGYFHL ANIFYDLKKL DLADTLYTKV SEIWHAYLNN 

251 HYQVLSQAHI QQMDLLGKLF ENDTGLDEAQ EAEA1RILTS ILNIRESTSD 

301 KAPQKTIFVL KILVMLYYLM MNSSKAQEYG MRALSLAKEQ QLDVHEQSTI 

351 QELLSLISTE DHPIT 

BLAST P hits 



No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_3Sn24, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35n24, frame 3 



(LENGTH) 
IMW] 
[pll 

[BLOCKS] 
[PROSITEJ 
[PROSITE] 
IPROSITE] 
[ PROSITE] 
(PROSITE) 
[PROSITE] 
(PROSITE) 
(KW) 
(KW) 

SEQ HNVIYPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGWHQKADWDSIHEKICQLLIPL 

SEG 

PRD ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

SEQ RTSMPFYNSEEERQHGLQQLQQRQKYLI EFCYTIAQKYLFEGKHEDAVPAALQSLRFRVK 

SEG xxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

SEQ LYGLSSVELVPAYPLLAEASLGLGRIVQAEEYLFQAQVTTVLKSTDCSHATHSLLHRNLGL 

SEG 

PRD hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ LYIAKKHYEEARYHLANDIYFASCAFGTEOIRTSGGYFHLANIFYDLKKLDLADTLYTKV 

SEG 

PRD eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEQ SEIHHAYLNNHYOVLSQAHIQQMDLLGKLFENDTGLDEAQEAEAIRILTSILNIRESTSD 

SEG 

PRD hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

SEQ KAPQKTIFVLKILVWLYYLMMNSSKAQEYGMRALSLAKEQQLDVHEQSTIQELLSLISTE 



PRD ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ OHPIT 

SEG 

PRD ccccc 



Report for DKFZphtes3_35n24 . 3 



365 

41768.24 
5.82 

BL00273 Heat-Stable enterotoxins proteins 

MYRISTYL 1 

IG_MHC 1 

AMI DAT ION 1 

CK2 PHOSPHO SITE 7 

tyr^phospjkTsite 4 
pkc_phospho_site 3 

ASN_GLYCOSYLATION 3 
Alpha_Beta 

LOW_COMPLEXITY 4.11 \ 



Prosite for DKFZphtes3_35n24 . 3 



PS00001 
PSO0001 
PS00001 
PS00005 
PS00005 
PS00005 



168-M72 
272->276 

322- >326 
114->117 
299->302 

323- >32S 



AS N_GL Y CO SYLATION 
ASN_GLYCOSYLATION 
ASH GLYCOSYLATION 
PKC~PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC0000S 
PDOC00005 
PDOC00005 
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PS00006 


4S->52 


CK2 PHOSPHO 


_SITE 


POOC00006 


PS00006 


69->73 


CK2~PHOSPKO* 


"site 


P DOC 0000 6 


PS00006 


12S->129 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


274->278 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


297->301 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00OO6 


349->353 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00OO6 


358->362 


CK2 PHOSPHO" 


'site 


pDocooooe 


PS000O7 


85->93 


TYR_PHOSPHO^ 


'site 


PDOC00007 


PS00007 


1B6->194 


tyr~phospho~ 


'site 


PDOC00007 


PS00007 


186->194 


TYR~PHOSPHO~ 


'site 


PDOC00007 


PS00007 


185->194 


tyr~phospho~ 


"site 


PDOC00007 


PSOOOOB 


27S->281 


MYRISTYL 




PDOC00008 


PS00009 


11->15 


AM I DAT I ON 




PDOC00009 


PSO0290 


35->42 


IG_MHC 




PDOC00262 



(No Pfam data available for DKFZphtes3_35n24.3) 
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DKFZphtes3_3Sn9 



group: metabolism 

DKFZphf tes3_35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase (EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 4S9-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 



carboxylesterase, splice variant 

5" extension of mRNA and N-terminal elongation of protein (64 aa), 
missing exon ! aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2888 bp 

Poly A stretch at pos. 2878, no polyadenylation signal found 



1 CTCGGCCTGA GGTGCGAGAG AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
51 CCCGGGAACA TGATGGTCGC TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
101 GCGGCGCTGG GTCGTGCGAG CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 

151 ACCAGGTCTC AGGGGGCACT AAAGGCGGTC GGAGGTAATC CCCACACCGC 

201 TTCCTCCTGG AAGTCAGGCT GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
251 GTCGCCTCTG GCCTGGCAGG GATCCTAGTG TCTCGGGACC TCCCGGTGAC 

301 GCGCCTGCCT CCCCTGCTGC ACCATAGGCC CGGGAGTACG GCGTCCCCAC 

351 AGCTTGGACC GGCAGGGGCT CGTGAAATGT TTGTCAACTG GATAAATGAC 
401 CATGGCCGTG GTCTCCGCGG GAGGTGAGGA AACTGAAAGC CACCGAGGAA 

451 AAGGGGGGCG CTCCTTAAGA AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 

501 AATGGCGTGA CTGAGTAGGG AGGGGACCGC GGAGACCCTC AGACCCTGGA 

551 CTGTAAGGAG ATGAGGGGCC GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 

601 AAAGCAAGGA GGAACTTCCA GAATGAAGGG CGCCGACACT CCTTCCTGCC 

651 TTTGCTCAAG CGGTTCCTTC ACCCCGATCA AGTTCCTTCC CATTTCTCCA 

70 1 TCTGGGGGAT CCTGAACGTG CACATCCTCA GAGAAGCCCT CCTGGGGTCT 

751 CCAATTCTAG TTTATTGCCC CCTCCTATCG ATCCCCCAGC GCGCTCATCG 

801 GGCCTGTGGA CAAGGACAGG TTTGAAGAGA GGATTCCCTG GATCGCGGAA 

851 GGGCTGCAGG AATGGCACAG CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 

901 CAAAGGAAAG TGGCCGTGCC CGGGCCTGCC TACCACTAGA TCCCCACCCA 

951 CCTATGACTG CTCAGTCCCG CTCTCCTACC ACACCCACCT TTCCCGGCCC 

1001 AAGCCAGCGC ACCCCGCTGA CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 

1051 GCAAGGCACT GATCCACTGC TGGACAGACC CGGGGCAGCC TCTGGGTGAA 

1101 CAGCAGCGTG TCCGCCGGCA GCGAACCGAG ACCAGCGAGC CGACCATGCG 

1151 GCTGCACAGA CTTCGTGCGC GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 

1201 TGCTTCTTGT CCGGGGCCAG GGCCAGGACT CAGCCAGTCC CATCCGGACC 

1251 ACACACACGG GGCAGGTGCT GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 

1301 TGCCGGGGTC CAAACCTTCC TGGGAATTCC ATTTGCCAAG CCACCTCTAG 

1351 GTCCGCTGCG ATTTGCACCC CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 

1401 AGGGATGGAA CCACCCATCC GGCCATGTGT CTACAGGACC TCACCGCAGT 

1451 GGAGTCAGAG TTTCTTAGCC AGTTCAACAT GACCTTCCCT TCCGACTCCA 

1501 TGTCTGAGGA CTGCCTGTAC CTCAGCATCT ACACGCCGGC CCATAGCCAT 

1551 GAAGGCTCTA ACCTGCCGGT GATGGTGTGG ATCCACGGTG GTGCGCTTGT 

1601 TTTTGGCATG GCTTCCTTGT ATGATGGTTC CATGCTGGCT GCCTTGGAGA 

1651 ACGTGGTGGT GGTCATCATC CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 

1701 AGCACTGGAG ACAAGCACGC AACCGGCAAC TGGGGCTACC TGGACCAAGT 

1751 GGCTGCACTA CGCTGGGTCC AGCAGAATAT CGCCCACTTT GGAGGCAACC 

1801 CTGACCGTGT CACCATTTTT GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 

1851 TCGCTTGTTG TGTCCCCCAT ATCCCAAGGA CTCTTCCACG GAGCCATCAT 

1901 GGAGAGTGGC GTGGCCCTCC TGCCCGGCCT CATTGCCAGC TCAGCTGATG 

1951 TCATCTCCAC GGTGGTGGCC AACCTGTCTG CCTGTGACCA AGTTGACTCT 

2001 GAGGCCCTGG TGGGCTGCCT GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 

2051 AATTAACAAG CCTTTCAAGA TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 

2101 TGCCCAGGCA CCCCCAGGAG CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 

2151 CCTAGCATTG TTGGTGTCAA CAACAATGAA TTCGGCTGGC TCATCCCCAA 

2201 GGTCATGAGG ATCTATGATA CCCAGAAGGA AATGGACAGA GAGGCCTCCC 

2251 AGGCTGCTCT GCAGAAAATG TTAACGCTGC TGATGTTGCC TCCTACATTT 

2301 GGTGACCTGC TGAGGGAGGA GTACATTGGG GACAATGGGG ATCCCCAGAC 

2351 CCTCCAAGCG CAGTTCCAGG AGATGATGGC GGACTCCATG TTTGTGATCC 

2401 CTGCACTCCA AGTAGCACAT TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 

2451 TACGAGTTCC AGCATCAGCC CAGCTCGCTC AAGAACATCA GGCCACCGCA 

2501 CATGAAGGCA GACCATGTTA AATTCACTGA GGAAGAGGAG CAGCTAAGCA 

2551 GGAAGATGAT GAAGTACTGG GCCAACTTTG CGAGAAATGG GAACCCCAAT 

2601 GGCGAGGGTC TGCCACACTG GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2651 GCAGCTGAAC CTACAGCCTG CGGTGGGCCG GGCTCTGAAG GCCCACAGGC 
2701 TCCAGTTCTG GAAGAAGGCG CTGCCCCAAA AGATCCAGGA GCTCGAGGAG 
27 51 CCTGAAGAGA GACACACAGA GCTGTAGCTC CCTGTGCCGG GGAGGAGGGG 
2801 GTGGGTTCGC TGACAGGCGA GGGTCAGCCT GCTGTGCCCA CACACACCCA 
2851 CTAAGGAGAA ACAAGTTGAT TCCTTCATAA AAAAAAAA 



BLAST Results 



Entry 050579 from database EHBL: 

Homo sapiens mRNA Cor carboxylesterase, complete cds. 
Score - 7197, p - 0.0e+00, identities - 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 

Score - 2B08, P - 1.2e-291, identities - 542/559, positives * 543/559, 
frame +3 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOXYLESTERASE B_l (279-295) 
CARBOXYLESTERASE B 2 (185-196) 



1 MTAQSRSPTT PTFPGPSQRT PLTPCPVQTP RLGKALIHCW TDPGQPLGEQ 
51 QRVRRQRTET SEPTMRLHRL RARLSAVACG LLLLLVRGQG QOSASPIRTT 
101 HTGQVLGSLV HVKGANAGVQ TFLGIPFAKP PLGPLRFAPP EPPESWSGVR 
151 DGTTHPAMCL QOLTAVESEF LSQFNMTFPS DSMSEDCLYL SIYTPAHSHE 
201 GSNLPVMVWI HGGALVFGMA SLYDGSMLAA LENVVWIIQ YRLGVLGFFS 
251 TGOKHATGNW GYLDQVAALR WVQQNIAHFG GNPDRVTIFG ESAGGTSVSS 
301 LVVSPISQGL FHGAIMESGV ALLPGLIASS ADVISTVVAN LSACD0VDSE 
351 ALVGCLRGKS KEEILAINKP FKMI PGVVDG VFLPRHPQEL LASADFQPVP 
401 SIVGVMHNEF GWLIPKVMRI YDTQKEMDRE ASQAALQKHL TLLMIPPTFG 
451 DLLREEYIGD NGDPQTLQAQ FQEMMADSHF VI PALQVAHF QCSRAPVYFY 
501 EFQHQPSWLK NIRPPHHKAD HVKFTEEEEQ LSRKMMKYWA NFARNGNPNG 
551 EGLPHWPLFD QEEQYLQLNL QPAVGRALKA HRLQFWKKAL PQKIQELEEP 
601 EERHTEL 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_35n9, frame 3 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human, N - 1, Score - 2808, 
P - 1.9e-292 

TREMBL:HSU60553_1 gene: "hCE-2"; product: "carboxylesterase"; Human 
carboxylesterase (hCE-2) mRNA, complete cds., H - 1, Score - 2761, P - 
1.8e-287 

PIR:A34329 60K esterase (EC 3.1.1.-) isoform 2 - rabbit, N - 1, Score ■ 
1985, P - 3.1e-205 

TREMBL:D50580_1 product: "carboxylesterase precursor"; Rat t us 

norvegicus mRNA for carboxylesterase, partial cds., V - 1, Score * 
1984, P - 4e-205 



>PIR:JC5406 carboxylesterase (EC 3.1.1.1) - human 
Length - 559 

HSPs: 

Score - 2808 (421.3 bits), Expect - 1.9e-292, P - 1.9e-292 
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Identities 
Query: 6! 
Sbjct: 
Query 



Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



125 
61 
185 
121 
245 
181 
305 
241 
365 
301 
425 
361 
485 
421 
529 
481 
589 
541 



- 542/559 (961), Positives - 543/559 (97* > 

► KRLHRLRARLSAVACGLLLLLVRGQ^QDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 124 

MRI^RLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
L MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 60 

IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVE3EFLSQFNMTFPSDSMS 184 
IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 
I PFAKP PLG PLRFAP PE PPESWSGVRDGTTH FAMC LQDLT AVES E FLSQFNMT FPSDSMS 120 

EDCLYLSI YTPAHSHEGSNLPVMVTflHGGALVFGMASLYDGSMLAALENVVVVI IQYRLG 244 
EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI IQYRLG 
EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGHASLYDGSMLAALENVVVVI IQYRLG 180 

VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 304 
VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 
VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLWS 240 

PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 364 
PISQGLFHGAIMESGVALI.PGLIASSADVISTWANLSACDQVDSEALVGCLRGKSKEEI 
PISQGLFHGAIMESGVALLPGLIASSADVISTVVAKLSACDQVDSEALVGCLRGKSKEEI 300 

LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 424 
LAINKPFKMI PGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 
LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDTQ 360 

KEMDREASQAALQKMLTLLMLPPTFGOLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 4B4 
KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 
KEMDREASQAALQKMLTLLMLPPTFGOLLREEYIGDNGOPQTLQAQFQEMMADSKFVIPA 420 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
LQVAH FQCS RAPV YFY E FQHQPSWLKN I RP PHMKADHGDEL P FV FRS FFGGN Y I KFTEEE 480 

EQL S RKMMK Y W AN FARNGN P NG EG L P H W P L FDQE EQ Y LQLN LQP A VG RA L KAH RLQFW KK 588 
EQL S RKMMK YWAN FARNGN PNGEGLPH WPL FDQEEQY LQLN LQP A VG RALKAHRLQFWKK 
EQLSPJ5MMKYWAN FARNGN PNGEGLPH WPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 540 

ALPQKIQELEEPEERHTEL 607 
ALPQKIQELEEPEERHTEL 
ALPQKIQELEEPEERHTEL 559 



Pedant information for DKFZphtes3_35n9, frame 3 



Report for DKFZphtcs3_35n9 . 3 



{ LENGTH] 607 

[MWJ 67051.20 

(pi) 6.11 

( HOMOL ] PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 0.0 

(BLOCKS] BL01173A Lipolytic enzymes "G-D-X-G" family, histidine 

( BLOCKS] BL00122G 

[BLOCKS] BL00122F 

(BLOCKS) BL00122E 

[BLOCKS] 8L00122D Carboxylesterases type-B serine proteins 

(BLOCKS) BL00122C Carboxylesterases type-B serine proteins 

(BLOCKS) BL00122B Carboxylesterases type-B serine proteins 

(BLOCKS) BL00122A Carboxylesterases type-B serine proteins 

[SCOP] dlaVn 3.56.1.1.4 Bile-salt activated lipase (Bovine (Bos taurus le-158 

(SCOP) d2ack 3.56.1.1.1 Acetylcholinesterase (Electric ray (Torped le-170 

[SCOP] dlthg 3.56.1.9.7 type-B carboxylesterase/lipase [fungu le-149 

[EC] 3.1.1.13 Sterol esterase le-52 

(EC) 3. 1.1.7 Acetylcholinesterase 5e-74 

(EC) 3.1.1.1 Carboxylesterase 0.0 

(EC) 3.1.1.8 Cholines terase 5e-68 

[EC] 3.1.1.59 Juvenile -hormone esterase le-34 

[EC] 3.1.1.3 Triacylglycerol lipase 3e-52 

[PIRKW] duplication 2e-47 

[ PIRKH] homotetramer 3«-67 

[PIRKW] transmembrane protein 9e-44 

[ PIRKW] microsome le-130 

(PIRKW) pancreas 3e-52 

[ PIRKW] endoplasmic reticulum le-I34 

[ PIRKW) homotrimer le-134 

(PIRKW) phosphatidylinositol linkage 5e-74 

[PIRKWl synapse 3e-73 

{ PIRKW) liver le-131 

(PIRKW) heparin binding 3e-52 
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[PIRKWl 


phosphoprotein 7e-25 




[PIRKW1 


glycoprotein le-134 




[PIRKW] 


thyroid hormone biosynthesis 2e- 


-47 


(PIRKWl 


carboxylic ester hydrolase 0,0 




[PIRKW] 


monomer 2e-42 




[PIRKW] 


disulfide bond 2e-31 




[PIRKW] 


mammary gland 3e-52 




[PIRKW] 


alternative splicing 5e-74 




[PIRKW] 


iodine 2e-47 




[PIRKW] 


pyroglutamic acid 6e-39 




[PIRKW] 


hydrolase le-135 




(PIRKW) 


muscle 3e-73 




[PIRKW] 


thyroid gland 2e-47 




(PIRKW] 


membrane protein 3e-73 




(PIRKW) 


neurotransmitter degradation 3e- 


■73 


[PIRKWl 


cholesterol 3e-52 




[ PIRKW) 


homodimer 2e-47 




[PIRKW] 


nerve 3e-73 




[SUPFAM] 


cholinesterase 0.0 




(SUPFAM] 


triacylglycerol lipase le-32 




(SUPFAM] 


cholinesterase homology 0.0 




(SUPFAM] 


thyroglobulin 2e-47 




[SUPFAM] 


thyroglobulin type I repeat homology 2e-47 


(SUPFAM] 


juvenile-hormone esterase 2e-35 




(SUPFAM] 


probable lipolytic protein ybaC 


le-07 


[PROSITE] 


CARBOXYLESTERASE B 2 1 




[PROSITEJ 


CARB0XYLESTERASE_B_1 1 




( PFAH] 


Carboxylesterases 




|KW) 


Alpha Beta 




[KW] 


3D 




[KW) 


LOW COMPLEXITY 3.95 % 





SEQ MTAQSRSPTTPTFPGPSQRTPLTPCPVOTPRLGKALIHCWTDPGQPLGEQQRVRRQRTET 

SEG xxxxxxxx ■ . . 

lacj- 

SEQ SEPTMRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQ 

SEG xxxxx 

lacj- ETTEEEECEEEEETTEE — EE 

SEQ TFLGI PFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAKCLQDLTAVESEFLSQFKMTFPS 

SEG 

lacj- E E EEEEC EETTTGGGT TTCCEECCCCCCEEECCCCCC BCCCCCCTTTTTT- HHHHHCCCC 

SEQ DSNSEDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENWWI IQ 

SEG 

SEQ YRLGVLGFFSTGDKHATGNWGYLDQVAALRWVQQmAHFGGNPDRVTIFGESAGGTSVSS 

SEG 

lacj- CCCCGGGCC C TTTT TTTCCHHHHHHHHHHHH HH HCGGGGC E E E EEE E E £ E EC HHHH H HH H 

SEQ LWSPISQGLFHGAIMESGVALLPGLIASSADVISTWANLSACDQVDSEALVGCLRGKS 

SEG 

SEQ KEEILAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRI 

SEG 

lacj- HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEEEEETBTHHHKHHTTTTT 

SEQ YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMF 

SEG 

lacj- TTTCCCCCHHHHHHHHHHHTTTTCHHHHHHHHHHCTTTTTTTHHHH-HHHHHHHHHHHHH 

SEQ VIPALQVAHFQCSRAPVYFYEFQHQPSWLKWIRPPHMKADHVKFTEEEEQLSRKMMKYWA 

SEG 

lacj- HHHHHHHHHHHHCCCCEEEE EECCCC GGGTT BTTTHH H CGGGCCC H H HHH H HHH HHH H H H 

SEQ N FARNGN PNGEGLPHW PL FDQEEQY LQLN LQPAVGRALKAHRLQFWKKALPQK I QELEE P 

SEG xxxxx 

lacj - HHHKHCCCCCCC — CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH 

SEQ EERHTEL 

SEG xxxxxx . 

l«j- 



Prosite Cor DKF2phtes3_35n9 . 3 
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PS00122 279->295 CARBOX Y LESTERAS E_B 1 
PS00941 18S->196 CARBOX Y LESTERAS E B~2 



PDOC00112 
PDOC00112 



Pfara Cor DKFZphtes3_35n9, 3 



HMM_NAME Carboxyles terases 

HMM *MfMnwlimFLLwmItwIi . WheqapcpPdPyiVdtnnCGklRGmNedtD 

+ +-L+++ ++++++++ ++Q++++P I T+ G + G ++ + 
Query 69 RLRARLSAVACGLLLLLVRGQGQDSASP— IRTTHT-GQVLGSLVHVK 113 

HMM NG. .pYYvFlGIPYAEPPVGNLRFKePQPYhePWtNVWNATnYPPMCMQW 

+ + +FLGIP+A+PP+G LRF +P+P +E W++V++ T+ P MC+Q+ 
Query 114 GAN AGVQT FLGI P FAK PPLGPLRFAPPEP- PESWSGVRDGTTH PAMC LQD 162 

HMM ndFGFWIFdmiertWNeniP. . eMS EDC L Y LNVWT PWn r k PNs k LPVMVW I 

-f++ +++N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
Query 163 LTAV — ESEFLSQFNMTFPSDSMSEDCLYLSIYTPAHSHEGSNLPVMVWI 210 

HMM HGGGFMFGSGhsYPliqYDgeylMMeeNVIWtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+W I+YRLG++GF+STGD + 

Query 211 HGGALVFGMA SLYDGSMLAALENVVWI IQYRLGVLGFFSTGDKH 255 

HMM IPPHGNWGLWDQRMALQWVQDNIAnFGGDPNNITIFGESAGGMSVHIHML 

+ GNWG++DQ++AL+WVQ+NIA+FCG+P+++TIFGESAGG+SV+ ++ 
Query 256 AT — GNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVV 303 

HMM S YGG DNPPraf KqLFHRAI MQSGsAmc PWvIQan yN« RqRAf RFA r imGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

Query 304 S PISQGLFHGAIMESGVALLPGLIASSA— DVISTVVANLSACD 34 5 

HMM rroDs S EMI qC LRs K pwEELWdAt Wn FWmW f Y f P Fl PW FFg PVI DGDDa P E 

+ DS-H-++ CLR K+ EE++++++ *F + + +DG+ 
Query 34 6 QVDS EALVGC LRGKSKEEI LA INK PFKMIPGV VDGV 381 

HMM aFI POHPeeMI kEGkFnDVPWI IGYNnDEGiWFapMmMnfnWf dEDeWId 

F+P+HP+E++++ F VP I+G+NN E++H++P M + + +E++ 
Query 382 -FLPRHPQELLASADFQPVPSIVGVNNNEFGWLI PKVMRI YDT-QKEMDR 429 

HMM itNedWyeWNPYIlFYrddfflSNikDMDDYiDkvyEeYPgWWDrFPqESYW 

++ ■+■ ++ M +L + ♦ + D ++EEY+G+ + PQ 

Query 430 EASQAALQKMLTLLHLPPT-F GDLLREEYIGDNGD- PQTLQA 469 

HMM n LqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFmWR 

++Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
Query 470 QFQEMMADSMFVIP— ALQVAHFQCSRAPVYFYEFQHQPSW LKN 511 

HMM WWPpWMgvdH' 

+PP+M++DH 
Query 512 IRPPHMKADH 521 

HMM •tEEEiisaMRniMMMYWINFAKhGHPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+HFA++GNPN++ GL++WP ++++EQY++ + 
Query 525 TEEEEQLS-RKMMKYWAHFARNGNPNGE--GLPHWPLFDQEEQYLQLNL 570 

HMM CIIraiQmCrmrDPYCNFW* 
+ +++♦+ + FW 

Query 571 QPAVGRALKAHR—LQFW 586 
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DKFZphtes3_3Spl7 



group: testes derived 

DKFZphtes3_35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
eukaryotes. Some of them, like armadillo, beta-catenin and plakoglobins have dual functions in 
intercellular Junctions and signalling cascades. Others, belonging to the importin-alpha- 
subfamily are involved in MLS recognition and nuclear transport, while some members of the 
armadillo family have as yet unknown functions. The novel protein shows similarity to S. 
cerevisiae protein Yel013p (VACS) and Danio rerio b-catenin, but contains no armadillo (arm) 
repeats . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. • 

similarity to S. cerevisiae VAC 8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos. 1956, polyadenylation signal at pos. 1935 



1 AAGTCAAATG TAAGATTGGT 
51 AATCCTCAAA TCAGACAGAA 
101 GGTGAATATA CTTGATTCTC 
151 AGACTATCGC GAATGTTGCC 
201 CAGCACGGGG GTATCACCAA 
251 TTCCACAAAA CCTGCCCAAT 
301 TGGCTCGCTG TGGGGCACTG 
351 AATAAAGAAG CCATCCGCAA 
401 GCTGAAGACT TCTCATGAAA 
451 AAGAGTGTGC ATCAGAGGAA 
501 ATCATTGAAA ACCTTGTCAA 
551 GGAGCACTGC GCCATGGCCA 
601 GGGACCTCGT TAGGCTGCAC 
651 AATAACACTG AC AAT AAAG A 
701 GAAATGTTCC ATCAGCAAAG 
751 CCATTGAAAC CTTGGTGGGA 
801 GTGAATGTGG TTGGGGCCTT 
851 AGTCATTGTC CGGAAATGTG 
901 TTGGAATAAA CCAAGCTCTT 
951 TCTGCAGTAG AACCTGAAAG 
1001 TCGTTTGTTG TGGTCCCTGC 
1051 GCGCAGCATG GGCACTCTGT 
1101 GAAATGGTTC GTTCCTTTGT 
1151 GAAATCAGAT AACAAAGAAG 
1201 ACATAGCAAA AGATCAAGAA 
1251 GTTCCTTTAT TGTCCAAACT 
1301 TCATCTAGCA CAAGCTATTT 
1351 TGGCCTTCGG TGAGCACAAA 
1401 TCAAATGACA CCAACGTGCA 
1451 CTCAGAAGAC GCCGATAACT 
1501 AGCTTCTACT GGATATGGTT 
1551 GCAGCTGGTT GTATATCCAA 
1601 GGCAAGATAC ACTTGAAATT 
1651 CATGACACAG GACATGTCAC 
1701 CAGTTGTTAG CAAACCCTTT 
1751 AAATGCACAG AATGTTTTTC 
1801 TATTTAATGT TTTCGAGATA 
1851 CCTGTGATAA GTTTCTAAGA 
1901 TAGTTCAGTG ATGCTTTTGT 
1951 CTTCCCAAAA AAAAAA 



TCATTAAAAA TACTGAAGGA AATCAGTCAT 
TATTGTTGAC CTTGGGGGCT TACCAATTAT 
CACACAAGAG TCTAAAATGT TTGGCAGCCG 
AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 
ACTGGTTGCT CT ACT AG ACT GTGCACATGA 
CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 
GCCCTGTGGA GCTGCAGTAA GACTCATACG 
AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 
ACATGCTAAT TCCAGTGGTG GGGACATTGC 
AACTACCGGG CTGCAATCAA AGCAGAAAGG 
GAACCTAAAT AGTGAGAATG AGCAGCTGCA 
TTTACCAGTG TGCTGAAGAT AAGGAAACCC 
GGAGGACTTA AGCCCTTGGC CAGTCTACTC 
GCGGTTAGCT GCTGTCACAG GGGCTATATG 
AGAATGTTAC CAAGTTTCGG GAATACAAAG 
CTTCTAACAG ATCAGCCTGA AGAAGTACTT 
GGGAGAATGC TGCCAAGAAC GTGAAAACCG 
GTGGCATTCA ACCACTTGTG AACCTCCTTG 
CTTGTGAATG TTACAAAAGC AGTTGGTGCT 
TATGATGATA ATTGATCGCT TAGATGGAGT 
TGAAAAATCC TCACCCAGAC GTGAAGGCCA 
CCATGCATCA AAAATGCAAA GGATGCTGGG 
TGGTGGTTTG GAACTTATTG TCAATTTACT 
TTCTGGCAAG TGTATGTGCT GCCATTACCA 
AATTTAGCTG TTATCACAGA TCATGGAGTT 
GGCAAATACA AATAACAATA AATTGAGACA 
CACGTTGCTG TATGTGGGGC AGGAATAGAG 
GCAGTGGCTC CACTAGTGCG TTATCTGAAA 
TCGGGCGACA GCTCAGGCCT TGTACCAACT 
GCATCACCAT GCATGAGAAT GGTGCAGTAA 
GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 
TATCCGCAGG CTGGCTCTTG CTACAGAGAA 
TAAATGGACA TTACAAGCTA TCAAATTCTA 
TCCCATGGCC AGAAAGCCTA AATTGGGAAA 
CAACCATCTA AATGAAAACA CACAAATTGA 
ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 
TGACATGTGA TAAGATGGAA AGCCAATAAA 
AT AT GAG AAT ATACGTATAT GATGTATTTT 
ATTTGTGGCG ATTTTAATAA AGGATATGGC 



BLAST Results 



No BLAST result 
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Medline entries 



Yel013p (Vac8p), an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane. 

98330438: 

YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharorayc.es 

cerevisiae vacuolar membrane that 

functions in vacuole fusion and inheritance. 

98158703: 

VacSp, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole. 



Peptide information for frame 3 



OUT from 99 bp to 1613 bp; peptide length: 505 
Category: similarity to known protein 
Classification: unset 



1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RWRQHGGIT KLVALLDCAH 
51 DSTKPAQSSL YEARDVEVAR CGALALWSCS KSHTHKEAIR KAGGIPLLAR 
101 LLKTSHENML IPVVGTLQEC ASEEHYRAAI KAERIIENLV KNLNSENEQL 
151 QEHCAMAIYQ CAEOKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 
201 WKCSISKEHV TKFREYKAIE TLVGLLTDQP EEVLVMWGA LGECCQEREN 
251 RVIVRKCGGI QPLVNLLVGI NQALLVWVTK AVGACAVEPE SMHI IDRLDG 
301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELIVNL 
351 LKSDNKEVLA SVCAAITNIA KDQEHLAVIT DKGVVPLLSK LANTNNNKLR 
401 HHLAEAISRC CKWGRNRVAF GEHKAVAPLV RYLKSMDTNV HRATAQALYQ 
4 51 LSEDADNCIT MHENGAVKLL LDMVGSPDQD LQEAAAGCIS NIRRLALATE 
501 KARYT 



BLAST P hits 



Ho BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_35pl7, frame 3 

PIR:S50446 VACB protein - yeast (Saccharomyces cerevisiae), N - 1, 
Score - 237, P - 7.8e-17 

PIR:T004O3 T13E15.9 protein - Arabidopsis thaliana, N - 1, Score - 215, 
P - 4.9e-14 

TREMBL:DR41081_1 product: "b-catenin"; Danio rerio b-catenin mRNA, 

complete cds., N - 1, Score - 195, P - 5.8e-12 



>PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae) 
Length - 578 

HSPs: 

Score - 237 [35.6 bits), Expect - 7.Be-17, p - 7.8e-17 
Identities - 106/401 (261), Positives - 177/401 (44%) 

Query: 92 AGGIPLLARLLKTSHENMLIPVVGTLQECASEENYRAAIKAERIIENLVKNLNSEMEQLQ 151 

+GG PL A +N+ + L E Y + E ++E ++ L S++ Q+Q 

Sbjct: 45 SGG-PLKALTTLVYSONLNLQRSAALAFAEITEKYVR0VSRE-VLEPILILLQSQDPQIQ 102 

Query: 152 EHCAMAI YQCAEDKETRDLVRLHGGLKPLASLLKNTDNKERLAAVTGAIMKCSISKENVT 211 

A+ A + E + L+ GGL+PL ♦ + DM E G I + +N 

Sbjct: 103 VAACAALGNLAVNNENKLLI VEMGGLEPLI NQMMG- DNVEVQCNAVGC I TNLATRDDNKH 161 

Query: 212 KFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQEREHRVIVRKCGGIQPLVNLLVGIN 271 

K A+ L L + V N GAL ENR + G *■ LV+LL + 

Sbjct: 162 KIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTD 221 

Query: 272 QALL VNVTKAVGAC A VEPESMMI IDRLDG— VRLLWSLLKNPHPuVKASAAWALCPCIKN 329 
+ T A+ AV+ + + + + V L SL+ +P VK A AL + 
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Sbjct: 


222 


rUV\jTYCTTAL*&HI AVUC*ANRnKI*AUl brKLVoMjVDifrlUd roanv^yn t Unur^r* wtaw 


281 


Query: 


330 


AKDAGEMVRSFVGGLELIVNLLKSDNKE-VLASVCAAITNI AKDQENLAVITDHGVV-PL 


387 




E+VR+ GGL +V L++SD+ VLASV A I NI + N +1 D G + PL 




Sbjct: 


282 


TSYQLEIVRA GGLPHLVKLIQSDSI PLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 


338 


Query: 


368 


LSKLANTNNNKLRHHLAEA I S RCCMWG-RNRVAFGEHKAVAPLVRY LKSN DTNVHRATAQ 


446 




+ L ++ +++ H +NR F E AV + +V ++ 




Sbjct: 


339 


VRLLDYKDSEEICCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 


397 


Query: 


447 


A L YQLS E D A D- NC I TMH ENG A V K LL L DMVGS POQDLQEAAAGC I SNI 492 






A + + AD + + + E* + M S +Q++ AA ++N+ 




Sbjct: 


398 


ACFAILALADVSKLOLLEANILDALIPMTFSQNQEVSGNAAAALANL 444 




Score 


- 213 


(32.0 bits). Expect - 3.6e-14, P - 3.6e-14 




Identities ■ 


■ 81/341 (23%), Positives m 163/341 (47%) 




Query: 


163 


EDKETROLVRLHGGLKPLAS LLKNT D-NKERLAAVTGAI WKCS 1 SKEMVTKFREYKAI ET 


221 




EDK+ D G LK L +L+ + + N +R AA+ A I*** V + + +£ 




Sbjct: 


36 


EDKDQLDFYS-GGPLKALTTLvYSDNLNLQRSAALAFA -EITEKYVRQVSR-EVLEP 


89 


Query: 


222 


L VG L LT DQ P EE VL VNWGALG EC CQE REN RVIVRKCGGIQP L VN L L VG I NQALL VH VT KA 


281 




♦+ LL Q ++ V ALG EN++++ ♦ GG++PL+N ++G N *■ N 




Sbjct: 


90 


ILILLQSQDPQIQVAACAALGNLAVNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGC 


149 


Query: 


282 


VGACAVEPESMMI I DRLDGVRLLWSLLKNPHPDVKASAAWALCPCI KNAKDAGEMVRSFV 


341 




+ A ++ I + L L K+ H V+ +A AL + ++ E+V + 




Sbjct: 


150 


I TN LAT RDDN KH K X ATS G AL I P LT KL AKS KH I RVQRN ATG A L L NMTH S E EN R K E L VN A — 




Query: 


342 


GGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVI--TDHGWPLLSKLANTNNHKL 


399 




G + ++V+LL S + +V A++NIA D+ N + T+ +V L L ♦+ +♦++ 




Sbjct: 


209 


GAVPVLVSLLSSTDPDVQYYCTTALSNI AVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query: 


400 


RHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLRSNDTNVHRATAQALYQLSEDADSCI 


459 




+■ A+ ++ + LV+ + A+ +■ +S N 




Sbjct: 


268 


KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query: 


4 60 


TMH ENGAVKLLLDMVGS PDQDLQEAAAGC I SNI RRLALATEKAR 503 






♦ + G +K L+ D + £ +S +R LA ++EK R 




Sbjct: 


328 


LIVDAGFLKPLVRLLDYKDSE— EIQCHAVSTLRNLAASSEKNR 369 




Score 


- 180 


(27.0 bits). Expect - 1.6e-10, P - 1.6e-10 




Identities « 


» 80/346 (231), Positives ■ 142/346 (41%) 




Query: 


145 


S EN EQLQEHCAHA I YQCAEDKETRDLVRLHGGLKPLAS LLNNTDNKERLAAVTGAI WKCS 


204 




S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 




Sbjct: 


58 


SDNLNLQRSAALAFAEITE-KYVRQVSR — EVLEPILILLQSQDPQIQVAACA-ALGNLA 


113 


Query: 


205 


ISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLV 


264 




++ EN E +E L+ + EV N VG + +N+ + G + PL 




Sbjct: 


114 


VNNEN KLLI VEMGGLE PLI NQMMG DNVEVQCN AVGC I TN LAT RDDN KH K I AT S GAL I PLT 


173 


Query: 


265 


NLLVGINQALLVNVTKAVGACAVEPESMMI IDRLDGVRLLWSLLKNPHPDVKASAAHALC 


324 




L + + N T A+ E* + V +L SLL + PDV+ AL 




Sbjct: 


174 


KLAK S KH I RVQRN AT GALLNMTH SEENRKELVN AG AV PVLVSLLSSTDPDVQYYCTTALS 


233 


Query: 


325 


PC I KNAKDAGEMVRS FVGGLELI VNLLKS DNKEVLAS VCAAI TN I AKDQEN LAV ITDHGV 


384 




++++++ ♦ +V+L+ S + V A+ N+A D I G 




Sbjct: 


234 


NIAVDEANRKKLAQTEPRLVSKLVSLMOSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 


293 


Query: 


3G5 


VPLLSKLANTNNNKLRHHLAEAISRCCKWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 


444 




+P L KL +++ L I + K * + PLVR L D+ + 




Sbjct: 


294 


LPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEIQCH 


353 


Query: 


445 


A-QALYQLS E DAD- NC I TMHENGAVKLLLDMVGS PDQDLQEAAAGC IS 490 






A L L+ ++ N E+GAV* ++ +0 + C *• 




Sbjct: 


354 


AVSTLRNLAASS EKNRKE FFESGAVEKCKELALDS PVSVQSEI SACFA 401 




Score 


- 155 


(23.3 bits), Expect - B.8e-0fl, P - 8.8e-08 




Identities ■ 


■ 88/401 (21%), Positives • 175/401 (431) 




Query: 


60 


LYEARD — VEVARCGALALWSCSKSHTNKEAIRKAGGI -PLLARLLKTSHENMLIPWGT 


116 




L *++D ++VA C AL + + ++ NK I + GG+ PL+ +++ + E + VG 




Sbjct: 


93 


LLQSQDPQIQVAACAALG- -N LAVNNENKLL I VEMGGLEPL I NQMMGDNVE - VQCN AVGC 


149 


Query: 


117 


LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLHG 


175 




+ A+ t+ + I + L K S++ ++Q + A+ +E R +LV G 




Sbjct: 


150 


I TNLATRDDNKHK I ATS GAL I PLTKLAKSKH I RVQRN ATGALLNHTHS EENRKELVNA-G 


206 


Query: 


176 


GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR— EYKAIETLVGLLTDQPEEV 


233 




+ L SLL++TD + T A+ ++ + N K E + +■ LV L+ V 





863 



WO 01/12659 



PCTVI BOO/01496 



Sbjct: 


209 


AVPVLVSLLSSTOPDVQYYCTT-ALSHIRVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query: 


234 


L V N V VGA LGEC CQE R E N RV I V RKCGG I Q PL VN LL VG I NQAL L VN VT KAVGAC A V E P E SKM 


293 




AL + ++ + + GG+ LV L+ + L+* + ++ P + 




Sbjct: 


268 


KCQATLALRNLASDTSYQLEI VRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query: 


294 


1 1 DRLDGVRLLWS LLK - NPHP DVKAS AAWALC PCI KNA - KD AG EMV RS FVGG LEL I VNLL 


351 




+ 1 ++ L LL +++ A L ++ K+ E S G +E . L 




Sbjct : 


328 


LIVDAGFLKPLVRLLDYKDSEEIOCHAVSTLRNLAASSEKNRKEFFES—GAVEKCKELA 


385 


Query: 


352 


KSDNKEVLA— SVCAAITHIAKDQENLAVITDHGWPLLSKLANTHNNKLRHHLAEAISR 


409 






V + SCAI +A D L+++ ++ L + +N++ + A A+ + 




Sbjct: 


386 


LOS PVS VQSEI SAC FA I LALA - DVS KLDLL - EANI LDAL I PMTFSQNQEVS GN AAAALAN 


443 


Query: 


410 


CCMWGRNRVAFGE H KA V AP - L V R Y L KS N DTN VH RAT AQAL Y QL S E 453 






C N E +■+■ + L+R+LKS+ + QL E 




Sbjct: 


444 


LC S RVNNYTKI I EAW DRPNEG I RG FL I RFLKS DY AT FEH I ALWT I LQLLE 493 




Score 


- 139 


(20.9 bits), Expect - S.0e-06, P - 5.0«-06 




Identities - 80/329 (24%), Positives - 142/329 (43%) 




Query: 


37 


GGI TKLVALLDCAHD- STKPAQ S S L Y EARDV E V ARCGA LAL WSC S KSHTN K EA I RKA 


92 




G IT L OH+TA +L +++ + V R AL + ♦ S N++ + A 




Sbjct: 


148 


GC I THLATRDDNKHK I ATSGAL I PLTKLAKSKHI RVQRN ATGALLNMTHS EENRKEL VNA 




Query: 


93 


GGIPLLARLLKTSHENMLIPVVGTLQECASEE-NYRAAIKAE-RIIENLVKNLNSENEQL 


150 




G +P+L LL ++ ++ L A +E N + + E R++ LV ++S + ++ 




Sbjct: 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEAHRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query: 


151 


QEHC AMAI YQCAEDK ETR- DLVRLHGGLK P LAS LLNNTDNKERLAAVTGAI WKCS I S KEN 


209 




+ +A+ AD + +-+VR GGL L L+ + D+ + A I SI N 




Sbjct: 


268 


KCQATLALRNLASOTSYQLEIVRA-GGLPHLVKLIQS-DS I PLVLASVACIRNISIHPLN 


325 


Query: 


210 


VTKFREYKAIETLVGLLT-DQPEEVLVNVVGALGECCQERE-NRVIVRKCGGIQPLVNLL 


267 




+ ++ LV LL EE+ +VL EHR +Gt+ L 




Sbjct: 


326 


EGL I VDAGFLK PLVRLLDYKDS EE I QCHAVSTLRN LAAS S EKNRKEFFESGAVEKCKELA 


385 


Query: 


268 


VG— INQALLVKVTKAVGACA-VEPESMMIIDRLDGVRLLWSLLKMPHPDVKASAAWA-L 


323 




+ ++ *+ A+ A A V ++■ + LD + + + +N A+AA A L 




Sbjct: 


336 


LDS PVS VQS E I S ACFA I LALA DVS KLDLLEAN I LDAL - 1 PMT FSQNQEVSGN AAAALAN L 


444 


Query: 


324 


CPCIKN-AKDAGEMVRSFVGGLELI VNLLKSD 354 






C + N K R G + + LKSD 




Sbjct: 


445 


CSRVHNYTKI IEAWDRPNEGI RGFLI RFLKSD 476 




Score 


• 136 


(20.4 bits), Expect - l.le-05, P - l.le-05 




Identities - 72/304 (23%), Positives - 133/304 (431) 




Query: 


58 


SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLI PWCTL 


117 




+ L +++ + V R AL + + S N++ + AG +P+L LL ++ ++ L 




Sbjct: 


173 


TKLAKS KH I RVQRNATGA LLNMTHS EEN RKEL VNAGAVPVLVSLLSSTDPDVQY YCTTAL 


232 


Query: 


11B 


QECASEE-NYRAAIKAE-RIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLH 


174 




A +E N + + E R++ LV ++S + +++ +A+ AD «■ ++VR 




Sbjct: 


233 


SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 


291 


Query: 


175 


GGLKPLAS LLNNTDNKERLAAVTGAI WKCS I SKENVTKFREYKAI ETLVGLLT-DQPEEV 


233 




GGL L L+ ♦ D+ + A I SI N ♦ ++ LV LL EE* 




Sbjct: 


292 


GGLPHLVKLIQS-DSIFLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEI 


350 


Query: 


234 


LVNVVGALGECCQERE - KR V I VRKCGGIQPLVNLLVG-- 1 NQALLVNVT KAVGAC A- VEP 


289 




+ V L E NR + G + + L + + + ++ A+ A A V 




Sbjct: 


351 


QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFAILALADVSK 


410 


Query: 


290 


ESKM 1 1 DRLDGVRLLWS LLKN PHPDVKASAAWA- LCPCIKN-AK D AGEMVRS FVGGLEL I 


347 




++ + LD + + + +N A+AA A LC + N K RG + 




Sbjct: 


411 


LDLLEANI LDAL- 1 PHTFSQNQEVSGNAAAALANLCSRVNMYTK I IEAWDRPNEGIRGFL 


469 


Query: 


348 


VNLLKSD 354 








+ LKSD 




Sbjct: 


470 


I RFLKSD 476 




Score 


- 114 


(17.1 bits). Expect - 2.7e-03, P - 2.7e-03 




Identities - 71/335 (21%) , Positives - 132/335 (39%) 




Query: 




MVNI LDSPHKSLKCLAAETIANVAKFKRARRWRQHGGITKLVALLDCAHDSTKPAQSSL 


60 




+ + S H ++ A +N+ +R++ G + LV+LL ST P 




Sbjct: 


172 


LTK LAKSKH I RVQRNATGA LLNMTHS EENRKELVNAGAVPVLVSLLS STD P 


222 


Query: 


61 


YEARDVEVARCGALALWSC S KSHTNKEA I RKAGG I P LLARLLKTSHENML I PWGTLQEC 


120 




DV+ AL* + +++ KA++LL++ + 1+ 








864 





WO 01/12659 



PCT/IBOO/01496 



Sbjct: 223 DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 279 

Ouery: 121 AS EEN YRAAI KAER 1 1 ENLVKNLNS EN EQLQEHCAMAI YQCAEDKET RDLVRLHGGLKPL 180 

AS+ +Y+ I + +LVK ♦ S++ L I ♦ L+ G LKPL 

Sbjct: 279 ASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLI VDAGFLKPL 338 

Query: 181 ASLLNNTONKERIAAVTGAIWKCSISKE-NVTKFREYKAIETLVGLLTDQPEEVLVNVVG 239 

LL+ 0++E + + S E N +F E A+E L D P V + 

Sbjct: 339 VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISA 398 

Query: 240 A LG ECCQE REN RVI V RKC GG I QPL VN L L VG I NQ AL L VN V T KA VG - AC A VE P ES MM 1 1 DRL 298 

+++ + + + L+ + NQ + N A+ C+ 11+ 
Sbjct: 399 C FA I LALADV S KLDLLEAN I LDAL I PMT FSQNQEVSGN AAAALAN LCS RVNN YTKI I EAW 4 58 

Query: 299 D GVR-LLWSLLKNPHPDVKASAAWALCPCIKNAKDAGE 335 

D G+R L LK+ + + A W + + ++ D E 
Sbjct: 459 DRPN EGI RGFLI RFLKSDY AT FEH I ALWT I LQLLES HN DKVE 500 

Score - 106 (15.9 bits), Expect - 2.0e-02, P - 2.0e-02 
Identities - 49/204 <24t), Positives - 89/204 <43»> 

Query: 65 D VE VARCG ALA - LWSCSKSHTNKEAI RKAGG IPLLARLLKTSH ENML I P VVGT LQEC A - S 122 

+VEV +C A+ + + + NK I +G + L +L K+ H + G L S 

Sbjct: 139 MV E V - QC N AVGC I TN LAT RDDN KHK I AT SG AL IPLTKLAKSKHI RVQRN ATGAL L NMT H S 197 

Query: 123 EEN Y RAA I KA E R 1 1 EN L V KN LN S EN EQLQEKC AMA IYQCAEDKETRO-LV RLKGG L - K P L 180 

EEN +■ + A ♦ LV L+S + +0 +C A+ A D+ R L + L L 
Sbjct: 198 EENRKELVNAGAV-PVLVSLL5STDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKL 256 

Query: 181 ASLLKNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNWGA 240 

SL+++ ♦+ + A T A+ + + + LV L+ +++ V 

Sbjct: 257 VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

Query: 241 LGECCQERENRVI VRKCGGIQPLVNLL 267 

+ N ++ G ++PLV LL 

Sbjct: 316 IRNISIHPLNEGLIVDAGFLKPLVRLL 342 

Pedant information for DKFZphtes3_35pl7, frame 3 



Report for DKFZphtes3_35pl7 . 3 

{LENGTH) 505 

[MW] 55224.34 

(pi) 8.43 

IHOMOL] PIR:S50446 VAC 8 protein - yeast (Saccharorayces cerevisiae) 2e-16 

( FUNCAT ] 30.25 vacuolar and lysosomal organization IS. cerevisiae, YEL013w] 8e-18 

[ FUNCAT ] 06.04 protein targeting, sorting and translocation (S. cerevisiae, YEL013w] 

8e-lfl 

( FUNCAT ) 09.25 vacuolar and lysosomal biogenesis (S. cerevisiae, YEL013w| 8e-18 

(FUNCAT) 08.01 nuclear transport [S. cerevisiae, YNL189wl 3e-06 

( FUNCAT) 03.22 cell cycle control and mitosis {S. cerevisiae, YNL189w} 3e-06 

[FUNCAT J 30.10 nuclear organization [S. cerevisiae, YNLlBSw) 3e-06 

[BLOCKS) BL01265C 

(BLOCKS) BL0O242A Integrins alpha chain proteins 

[ SCOP) d3bct 1.91.1.1.1 beta-Catenin [Mouse (Mus musculust 7e-18 

[PIRKW] cytosol 3e-ll 

[PIRKW] apoptosis 3e-ll 

[PIRKW] carcinogenesis 3e-ll 

[PIRKW] cell adhesion 3e-ll 

[PIRKW] cytoskeleton 3e-12 

[SUPFAM] pendulin le-07 

[KW) All_Alpha 

[KW) 3D 

(KW) LOH_COMPLEXITY 2 . 38 % 

S EQ MVN 1LDSPHKSLKC LAAET I ANVAKFKRARRVVRQHGG1 TKLVALLDCAHDSTK PAQS S L 

SEG XXXXXXXXXXXX 

2bct- HH 



SEQ YEAROVEVARCGALA LWSCSKSHTNKEAI RKAGG I PLLARLLKTSHENML I PWGTLQEC 

SEG 

2bct- HHCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCHHHHHHHGGGCCCHHHHHHHHHHHHHH 

SEQ AS EEN Y RAA I K AERI I EN LVKNLNS EN EQLQEHC AMA I YQCAEDKETROLVRLHGGLKPL 

SEG 

2bCt- HHTTTHHHHHHHHCHHHHHHHHHCCCCHHHHHHHHKHHHHHHTTHHHHHHHHHHCHHHHH 
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WO 01/12659 
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SEQ ASLLNHT DN KERLAAVTGAI WKC S I S KENVTK FREYKAI ET L VGLLTDQPEEVLVNWGA 

SEG 

2bCt- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVIVRKCGGIQPLVNLLVGIKQALLVNVTKAVGACAVEPESMMI IDRLDG 

SEG 

2bCt- H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLA 

SEG 

2bCt- HKHHHHHHHTTTKHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ S VC AA I TNI AKDQENLAVI TDHGW PLLS KLANTfJNNKLRHHLAEA I S RCCHWGRNRVA F 

SEG 

2bCt- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHKCHHHHH 

S EQ GEHKAVAPLVRYLKSN DTN VHRATAQALYQLSEDADNC ITMH ENGAVKLLLDMVGS PDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCISNIRRLALATEKARYT 

SEG 

2 bCt- HKHHHHHHH 



(No Proaite data available for DKFZphtes3_35pl7 . 3) 
(No Pfara data available for DKFZphtes3_35pl7 . 3) 
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DKFZphtes3_35p22 



group: cell cycle 

DKFZphtes3_35p22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 (tre-2 
locus) , 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiquitination. 



strong similarity to oncogene 1 (tre-2 locus) 
membrane regions : 1 

complete cONA, complete cds, EST hits 
Sequenced by OKFZ 
Locus: map»" 17" 
Insert length: 2072 bp 

Poly A stretch at pos. 2062, polyadenyiation signal at pos. 2039 



1 GTTACACACft GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 

51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 

101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 

151 TCATTATGAA ATACGAAAAG GGACACCGAG CTGGGCTGCC AGAGGACAAG 

201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 

251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 

301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 

351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 

401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 

451 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT AC C AG AT CAT GAAGGAGAAG 

501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 

551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 

601 GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 

651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 

701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 

751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 

801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 

851 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 

901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 

951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 

1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 

1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 

1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 

1151 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 

1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 

1251 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 

1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 

1351 GTGGGGCTGT CCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 

1401 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 

14 51 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 

1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 

1551 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 

1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 

1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 

1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 

1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 

1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 

1851 TATTAAGAAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 

1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 

1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 

2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 

2051 CCTGTTGAAA TGAAAAAAAA AA 



BLAST Results 



Entry AC003976 from database EMBL: 

Homo sapiens chromosome 17, clone nCIT.91_J_4, complete sequence. 
Score - 4385, P - 0.0e+00, identities - 881/896 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS AO01W35. 
Score - 850, P - 1.9e-32, identities - 170/170 



Medline entries 



92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast DGA4 gene encodes a deubiquitinating enzyme 
related to a product of the human tre-2 oncogene. 

95176708: 

UBP5 encodes a putative yeast ubiquitin-specif ic protease 
that is related to the human Tre-2 oncogene product. 



Peptide information for frame 3 



ORF from 99 bp to 174 5 bp; peptide length: 549 
Category; strong similarity to known protein 



1 MDWEVAGSW WAQEREDI IK KYEKGHRAGL PEDKGPKPFR SYNNWVDHLG 

51 IVHETELPPL TAREAKQIRR EISRKSKKVD MLGOWEKYXS SRKLIDRAYK 

101 GMPMNIRGPM WSVLLNTEEN KLKNPGRYQI MKEKGKKSSE HIQRIDRDVS 

151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHIAALFLLY 

201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 

251 MGHQDKKDLC GOCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 

301 TRIAFKVQQK RLTKTSRCGP WARFCWRFVD TWARDEDTVL KHLRASMKKL 

351 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 

401 IWSASPPRAP RSSTPCPGGA VREDTYPVGT QGVPSPALAQ CGPQGSWRFl 

451 QWNSMPRLPT DLDVEGPWFR HYDFRQSCWV RA1SQEDQLA PCWQAEH PAE 

501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLASTP hits 

Ho BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_35p22, frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N • 1, Score ■ 
2181, P - 5.5e-226 

PIR:S57867 oncogene 1 - human, N - 1, Score - 1536, P - 1.2e-157 



>P1R:S22155 oncogene 1 (tre-2 locus! (clone 210) - human 
Length - 786 



Score - 2181 (327.2 bits), Expect - 5.5e-226, P - 5.5e-226 
Identities - 405/500 (81*) , Positives - 440/500 (88») 

Query: 1 MDVVEVAGSWWAQEREDI IMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETEIPPL 60 

MD+VE A S AQER+D I +MK Y+ KGHRAGL PEDKGP+ P N+++D GI+HETELPP+ 
Sbjct: 1 MDMVENADSLQAQERKDILMKYDKGHRAGLPEDKGPEPV-GINSSIDRFGILHETELPPV 59 

Query: 61 TAREAKQIRREISRKSKWVDMLGOWEKYKSSRKLIDRAYKGHPHNIRGPMWSVLLNTEEM 120 

TAREAK+IRRE++R SKW++MLG+WE YK S KLIDR YKG+PMNIRGP+WSVLLN +E+ 
Sbjct: 60 TAREAKKIRREKTRTSKWMEMLGEWETYKHSSKLIDRVYKGIPMNIRGPVWSVLLWIQEI 119 

Query: 121 KLKHPGRYQIKKEKGKKSSEHIQRIDROVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 180 

KLKNPGRYQIKKE+GK+SSEHI ID DV TLR fHFFRDRYG KQREL +ILLAY EY 
Sbjct: 120 KLKNPGRYQ1HKERGKRSSEHIHHIDLDVRTTLRNHVFFRDRYGAKQRELFY1I.LAYSEY 179 

Query: 181 HPEVGYCRDLSHIAALFLLYLPEEDAFWAtVQLLASERHSLQGFHSPNGGTVQGLQDQQE 240 

NPEVGYCRDLSHI ALFLLYLPEEDAFWALVQLLASERHSL GFHSPHGGTVQG LQOQQ E 
Sbjct: 180 NPEVGYCROLSHITALFLtLYliPEEDAFWAIiVQLLASERHSLPGFHSPNGGTVQGLQDQQE 239 
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Query: 2*1 HVVATSQPKTMGHQDKKOLCCQCSPLGCLIRILIOGISLGLTLRLWDVVLVEGEQALMPI 300 

HVV SQPKTH HQDK+ LCGQC+ LGCL+R LIDGISLGLTLRLWDVYLVEGEQ LMPI 
Sbjct: 240 HWPKSQPKTKWHQDKEGLCCOCASLGCLLRNLIDGISLGLTLRLWDVyLVEGEQVLMPI 299 

Query: 301 TRIAFKVQQKRLTKTSRCGPWARFCNREVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T I A KVQQKRL KTSRCG WAR W+T DTWA + + DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct : 300 TSIALKVQQKRLMKTSRCGLWARLRHQFFDTMAMNODTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPOGSWRFLQWWSMPRLPTDLDVEGPWFRHYDFRQSCWV 480 

VREDTYPVGTQGVPS ALAQGGPQCSWRFL+W SMPRLPTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSIALAQGGPQGSWRFLEWKSMPRLPTOLDIGGPWFPHYDFERSCWV 479 

Query: 481 RA I SQE DQLAPCWQAEH PAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 480 RAISQEDQLATCWQAEHCGE 499 

Pedant information for DKFZphtes3_3Sp22, frame 3 



Report for DKFZphtes3_35p22 . 3 



[LENGTH) 549 

[KM] 62159.16 

|pl] 9-23 

(HOMOLJ PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 0.0 

[ FUNCAT 1 11.01 stress response |S. cerevisiae, YGRlOOw] 2e-l6 

I FUNCAT ) 04.05.01.04 transcriptional control IS. cerevisiae, YGRlOOw) 2e-16 

[ FUNCAT ) 99 unclassified proteins IS. cerevisiae, YNL293w] 3e-15 

1PIRKWJ transmembrane protein 6e-14 

[PROSITE] MYRISTYL 6 

IPROSITEj AMI DAT ION 1 

[ PROSITE) CAMP_PHOSPHO SITE 3 

IPROSITEJ CK2_PHOSPHO_SITE 4 

{ PROSITE] TYR PHOSPHO SITE 2 

( PROSITE) PKC~PHOSPHO~SITE 10 

[KW) TRANSMEMBRANE 1 

[KW) LOW COMPLEXITY 5.28 \ 



SEQ MDWEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

5 EG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 



PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKN PG RYQIMKEKGKKS SEH I QRI DRDVSGTLRKH I FFRDR YGTKQRELLH ILLAYEEY 

SEG 

prd ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVG YCRDLSH I AA LFLLYLPEEDAFWALVQLLASERHSLQG FHS PNGGT VQG LQDQQE 



PRO ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 

SEG 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM MMMMMMMMMMMMMMMMKM 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 



869 



WO 01/12659 



PCT/IBOO/01496 



SEO VREOTYPVGTQGVPSPALAQGGPQGSWRrWWNSMPRLPTDLDVEGPWFRHYDrRQSCWV 

SEG 

PRO ccccccccecccccccccccccccccMeetcccccccccceccccccceccccceeccc 

HEM 

SEQ RAISQEDQIAPCWQAEHPAERVRSAFAAPSTOSDQGTPFRARDEQQCAPTSGPCLCGLHL 



PRO cchhhhhhhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccc«««o 

MEM 

SEQ ESSQFPPGF 

SEG 

PRO CCCCCCCCC 

MEM 



Prosite for DKFZphtea3_35p22. 3 



PS000O4 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
P50000S 
PS00005 
PS00005 
PS00005 
PS00005 
PS00O0S 
PS00006 
PS0OOO6 
PS00OO6 
PSOOO06 
PS0OOO7 
PS00007 
PS00008 
PS00008 
PSO00O8 
PS00008 
PS00008 
PS00008 
PS00009 



136->140 
310->314 
348->352 
61->64 
73->76 
90->93 
152->155 
216->219 
282->285 
31S->318 
346->349 
351->354 
446-M49 
61->65 
4 60->464 
484->488 
511->515 
93->100 
92->100 
8->14 
101->107 
230->236 
276->2B2 
366->372 
441->447 
134->138 



CAHP_PHOSPHO_SITE 
CAMP PHOSPHO_SITE 

camp2phospho_site 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 

PKC~PHOSPH(TsiTE 

PKC~PHOSPHO_SITE 

PKC"PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC - PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR PH05PHOSITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 



PDOC00004 
PDQC00004 
PDOC00004 
PDOCOOOOi 
PDOC00005 
PDOC0OO05 
PDOC00005 
PDOC000D5 
PDOC00005 
PDOC00005 
PDOCO0005 
PDOC000O5 
PDOCO0005 
PDOCO0006 
PDOC00006 
PDOCO0006 
PDOC00006 
PDOC00007 
PDOC00007 

PDOcooooa 

PDOC000O8 
PDOC000OB 

pDocooooa 

PDOC00008 
PDOC00008 
POOC000O9 



(No Pfam data available for DKFZpht«3_35p2Z. 3) 
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DKFZphtes3_4b4 



group: testes derived 

DKFZphtes3 4b4 encodes a novel 497 arai.no acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins S,CP/Tpx-l/Ag5/PR-l/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

Ho informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes or as a new protease inhibitor. 



Strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 
Sequenced by AGOWA 

locus: /map-"333.4 cd from top of Chrl6 linkage group" 
Insert length: 4574 bp 

Poly A stretch at pos. 4551, p-=lyadenylation signal at pos. 4539 



1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 
51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 
101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 
151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 
201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 
251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 
301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 
351 AGCCATCCCC AGGGAGGACA AGGAGGAGAT CCTCATGCTG CACAACAAGC 
401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 
451 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 
551 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA ACGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC AC AC AG AT AG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
751 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 
1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AGAAAACCTC TGCGGTCAAC 
1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 
1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 
1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 
1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 
1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAGAGACACG 
1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 
1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 
1401 CCCGTTTGAA AAGCCAGCAA CTCACTGCCC AAGAATCCAT TGTCCGGCAC 
1451 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 
1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 
1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAAAAGA 
1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 
1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 
1701 TTCCAGCACC AGGGCAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 
1751 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 
1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 
1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 
1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGGGTCTCCA TCTGGACGTC 
1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 
2001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 
2051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 
2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 
2151 AAGTAGAAGG TAGTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 
2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 
2251 AGTAAGAGGG CTGCGGGTAT CAGAGACCCC GGCTCCGCCC TGGCACGTGT 
2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 
2351 GCTCTATACA CACCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 
2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATCGC CCACCTGTTT 
2451 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 TTACCCCCTA CCCATTGTGG CTCCCACCCT GCCTCGGACT GGTTTACGTG 
2551 TCCTGGTTCA CACCCAGGAC TTTTCTTTGC AAGCGAACCT GTTTGAAGCC 
2601 CAAGTCTTAA CTCCTGGTCT CGTAAGGTTC CACTGAGACG AGATGTCTGA 
2651 GAACAACCAA AGAAGGCCTG CTCTTTGCTG CTTTTAAAAA ATGACAATTA 
2701 AATGTGCAGA TTCCCCACGC ACCCGATGAC CTATTTTTTC AGCCGTGGGA 
2751 GGAATGGACT CTTTGGTACA TTCCTCACCG AGGTTAGCAC CTCAGTTTGT 
2801 GGTTATGAAA CCGTCTGTGG CCTCATGACA GCGAGAGATG GGAATACACT 
2851 AGAAGGATCT CTTTTCCTGT TTTCGTGAAA CGACTCTTGC CAAACGTTCC 
2901 CGAGGCGCCA AGGAGTGTAG TACACCCTGG CTGCCATCAC TCTATAAAAG 
2951 TGCTTCATGA GCCCAGACCA AAAGCCCACA GTGAAATGAA GTACCCTTTT 
3001 GTAAATAGCA TTTTTTTGCA GAAGGTGAAA ATTCCACTCT CTACCACCGG 
3051 GCCAGCCAAT AGATCACTTT GGTGAATGCT AGTTTCAAAT TTGATTCAAA 
3101 ATATTTCTTA GGTCAAAGAA CTAGCAGAAA GTCAAAAACT AAGATACTGT 
3151 AG ACT GG AC A AGAAATTCTA CCTGGGCACC TAGGTGATGC CTTCTTTCTT 
3201 TGATTGCCTT TCTAATAAAT GCAGAATCTG AAGGTAAATA GGTTTAAAAC 
3251 AAAACAAAAA CCCACCCCTT TAAGGAGTTG GTAAAAAGCA GTTCAACTCT 
3301 TAGCTTGACT GAGCTAAAAT TCACAGGACT ACGTGCTTTG TGCATTGTAG 
3351 TCTAGTCGTA ATTCATAGGT ACTGACTCCT CACCCCCAAA TGTCGGAGAG 
3401 GAAGAATTCG GTCAGCCTGT CAGGTCGTGA GTCCAGTTAC CACCAAACAT 
3451 CTGGGAAACT TCTGGGTGCT GGGTGCTCTG CTGCTGGACT TTTGTGGCTG 
3501 TGTCTGTGTC TGCAAGATAA ATTAGATCGC CCTGTGGGGT TTGCAGAATT 
3551 AGTGAAGGGT CCAGGACGAT CCCAGTGGGC TCGCTTCCAA AGCATCCCAC 
3601 TCAAGGGAGA CTTGAAACTT CCAGTGTGAG TTGACCCCAT CATTTAAAAA 
3651 TAAAGTCCCC GGGTTCCTTA ATGCCTCCTT CACTGGGCCT TCCTAGCAGG 
3701 ATAGAAAGTC CTTGCCCAGA GCAGGACCTG GCTGTCTTTT TTTTTTTTTT 
3751 TTTCCCGACA CCAAGTTTCA CTCTGTTGCC CAAGGTAGAG TGCAGTGGCG 
3801 TGATCTCTGC TCATTGCAAC TGCCGCCTCC CGGGTTCAAG CAATTCTCAT 
3851 GCATCAGCCT CCCAAGTACC TGGGACTACA GGCGTGAGCT ACCATGCCCG 
3901 GCTAATTTTT GTATTTTTAG TAGAGATGGG GTTTCATTAT GTTGGCCAGG 
3951 CTGGTCTCGA ACTCCTTACC TCAGGTGATC CACCCACCTT GGCCTCCCGA 
4001 AGTGCTGGGA TTACAGGCAT GAGCCACTGC GCCCGGCCAT GGACCTGGCT 
4051 GTCTTTATCA TCCCCACAAA CATTTTGAAA CTGGAATATT TGTCTTCAGA 
4101 AAATGGAAAC AAGACTATAA ATGATAAGCC CTGTCCCTAG CACCACCTCT 
4151 CCTGTGTGTG GAATAGAGGC CCCTCGTGCT ACCAACACTT ACCCTGTGTT 
4201 TAAAAAGATC TTGTACCAAG CCAACGGCGT TCCTGGCTCT CCTGCCCACA 
4251 CGATGAACAT TTTCGGCTTC CTTAGGAGTT TTGCCCTACC GTATTCCAAA 
4301 GCGTGTGCTG GTTTCTCATA TTGTCTGTAG GCTCACTCAG CCCGCAGTTT 
4351 ATGTGTGTGC TTTTTTCTAT GAAAAATGAT GTATTTTGCT ACTTCCTGTG 
4401 TACAAAGTTT TATTGTAAAT GTTTTTTGTG CTTTGCATGA ACAGGGGCCA 
4451 CGTTGTTGCA ATTGTTTCAG TAGAACTGGT TTGATTTCTA AAATGTTCCT 
4501 GTAACATATC TTTTATGAAC AAATCTGAAC AATTTGTGAA ATAAAACATT 
4551 GAAAACCAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS834352 from database EHBL: 
human STS WI-15502. 
Score - 1331, P - 5.4e-54, identities - 287/301 



Medline entries 



9B146272: 

cDBA cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 



Peptide information for frame 1 



ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 



1 MSCVLGGVIP LGLLF-LVCGS QGYLLPNVTL LEELLSKYQH NESHSRVRRA 

51 IPREOKESIL MLHSKLRGQV QPQASNHEYM TWDDELEKSA AAWASQCIWE 

101 HGPTSLLVSI GQNLGAHWGR YRSPGFHVQS WYDEVKDYTY PYPSECNPWC 

151 PERCSGPMCT HYTQI VWATT NKIGCAVNTC RKMTVWGEVW ENAVYFVCNY 

201 SPKGNWIGEA PYKHGRPCSE CPPSYGGSCR NNLCYREETY TPKPETDEMN 

251 EVETAPIPEE NHVWLQPRVM RPTKPKKTSA VNYMTQWRC DTKMK0RCKG 

301 STCNRYQCPA GCLNHKAKIF GTLFYESSSS ICRAAIHYGI LDDKGGLVDI 

351 TRNCKVFFFv KSERHGVQSL SKYKPSSSFM VSKVKVQDLD CYTTVAQLCP 

401 FEKPATHCPR IHCPAHCKDE PSYWAPVFGT NIYAOTSSIC KTAVHAGV1S 
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451 NESGGDVDVH PVDKKKTYVG SLRWGVQSES LGTPRDGKAF RIFAVRQ 
BLASTP hits 

No BLAST P hits available 

Meet BLAST P hits for DKFZphtes3_4b4, frame X 

TREN8LNEW:AF109674_1 gens: "Lgll"; product: "late gestation lung 
protein 1"; Rattus~norvegicus 1st* gestation lung protein 1 (Lgll) 
mRNA, complete cds., N - 1, Store - 968, P - 1.9e-97 

TREMBL:D45027_1 product: "25 kDa trypsin inhibitor"; Korao sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., H - 1, Score - 738, P - 
4.5e-73 

TREM8L:AB009609_1 gene: "HrTT-1"; Halocynthia roretti HrTT-1 mRNA, 
complete cds., K - 1, Score - 345, P - 2e-31 

PIR:JC5308 testis-specific, vespid, and pathogenesis-releted protein 1 
precursor - human, N - 1, Score - 337, P - 1.7e-30 

>TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds. 

Length - 188 



Score - 968 (145.2 bits), Expect - 1.9e-97, P - 1.9e-97 
Identities - 160/185 (861), Positives - 170/1B5 (91%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



61 MLHH KLRGQVQ PQA5NME YMTW D DEL EK5 AAAWASQC IHEHGPTSLLVSI GQN LGAHWG R 120 
KLHNKLRGQV P A5NHEYMTWD+ELE+SAAAWA +C+MEHGP SLLVSIGQNL KWGR 
1 KLHNKLRGQV Y PPA5NME YMTWDEELERS AAAWAQRCLWEHGPAS LLV5 I GQN LAVHWGR 60 

121 YPJPCFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAWTC 180 
YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MCTHYTQ+VWATTNK1GCAV+TC 
61 YRSPGrHVQSWYDEVKDYTYPYPHECNPWCPERCSGAMCTHYTQMVWATTNKIGCAVHTC 120 

1B1 RKMTVWGEWENAVYFVCNYSPKGNWIGEAPYKNCRPCSECPPSYCCSCRNNLCYREETY 240 

R H+VWG++WENAVY VCNYSPKGNWIGEAPYK+-GRPCSECP SYGG CRNNLCYREE Y 
121 RSMSVWGDI WENAV YLVCN YS PKGNWI GEAP YKHGRPCS EC PSS YGGGC RNNLC YREEH Y 1B0 

241 TPKPE 245 
KPE 

1B1 HQKPE IBS 



Pedant information for DKFZphtes3_4b4 , frame 1 



Report for DKFZphtes3_4b4 . 1 



(LENGTH) 497 

[MW] 55920.00 

|pl] 8.36 

(HOMOLl TREMBL:D45027_1 product: "25 kDa trypsin inhibitor"; Homo sapiem 
kDa trypsin inhibitor, complete cds. 6e-78 

IFUNCAT) 99 unclassified proteins IS. cerevisiae, YJL078c) 8e-12 

[BLOCKS) BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

( BLOCKS) BL01009D Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[ BLOCKS) BL01009C Extracellular proteins SCF/Tpx-1/Ac5/PR-1/Sc7 proteins 

[BLOCKS) BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[PIRKW] glycoprotein 5e-22 

[PIRKW) blocked amino end 5e-13 

[PIRKW) brain 9e-30 

[ PIRKW) hydrolase 4e-09 

j PIRKW) hemolymph coagulation 4«-09 

(PIRKW) zymogen 4e-09 

1 PIRKW) alternative splicing 4e-09 

(PIRKW) sperm Se-22 

(PIRKW) viroid-induced protein 2e-ll 

( PIRKW) venom 6e-18 

(PIRJCW) pyroglutamic acid 2e-U 

(PIRKW) transmembrane protein 2e-10 

(PIRKW) serine proteinase 4e-09 

(SUPFAM) C-type lectin homology 4e-09 

(SUPFAM) trypsin homology 4e-09 
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| SOPFAM] 
[SOPFAM] 
[SUPFAH] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
f PROSITE) 
[ PROSITE) 
[PROSITE) 
( PROSITE) 
(PFAM) 



complement factor H repeat homology 4e-09 
cysteine-rich secretory protein 1 6e-24 
pathogenesis-related l«af protein 7e-15 
MYRISTYL 8 



CAMP_PHOS PHO_S I TE 3 

CK2_PHOSPHO SITE 6 

TYR PHOSPHO_SITE 1 

PKC~PHOSPHO_SITE 0 

AS H — G L YC OS Y LAT I ON 3 

SCP_AGS_PR1_SC7_2 1 



[KW] 
[KH] 
[KM) 



SCP-like extracellular Proteins 

SIGNAL PEPTIDE 23 

LOW COMPLEXITY 1.21 * 



SEQ MSCVLGGVIPLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAIPREDKEEIL 



S EQ MLH K K LRGQ VQPQAS NME Y MTW D D EL EKS AAAW AS QC I WEHG PTSLLVS I GQNLGAHWGR 



PRO hhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

SEQ YRS PGFHVQSW YDEVKDYT YPYPS ECN PWC PEPXSG PHCTH YTQI VWATTNKI GCAVNTC 

SEG 

PRO ccccchhhhhhhhhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

S EQ RKMTVWGE VW EN A V Y FVCNYS PKGNW I GEAPYKNGRPCSEC P PS YGG SC RNNLC Y REET Y 

SEG 

prd cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

SEQ TPKP ETDEMNEVETA PI PEENHVWLQPRVMRPTKPKKTSAVN YMTQVVRC DTKMKDRCKG 

SEG 

PRD cccccccccccccccccccceeeeecccccccccccce«««eeeeeeeeecccccccccc 

SEQ STCNRYQCPAGCLNHKAKI FCTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

SEG 

PRD ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

S EQ KS ERHGVQS LSK YK PSSS FMVSKVKVQDLDC YTTVAQLC P FEK PATHCPRIHC PAHCKDE 

SEG 

PRD eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

S EQ PSYWAPVFGTN I YADTSS I CKTAVHAGV I SNESCCDVDVMPVDKKKTYVGSLRSGVQSES 

SEG 

PRD ccce«ee«c««eccccceeeeeeeeccccccccccccceeecccceeeeeecceeeeeee 

SEQ LGTPRDGKAFRIFAVRQ 

SEG 

PRD ccccccccceeeeecce 



SEG 
PRD 



ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 



Prosite for DKF2phtes3_4b4 . 



PS00O01 
PS00001 
PS00O0L 
PS00004 
PS00O04 
PS00004 
PS0OOO5 
PS0OO05 
PS0OO05 
PS0OO0S 
PS0OO05 
PS0OO05 
PS0OO05 
PS0OO05 
PS00006 
PS00O06 
PS00006 
PS0O006 
PS0O006 
PS00006 
PS00007 
PS00008 
PS00008 

psooooa 

PS0O0O8 



130->134 
453->457 
483->487 
385->393 
111-M17 
115->121 
174->18G 
204->210 



451->455 
181->18S 
276->280 
464->468 
170->173 
179->182 
201->204 
228->:31 
241->244 
362->365 
471->474 
483->486 



29->33 
75->79 
81->S5 



27->31 
41->4S 



MYRISTYL 
MYRISTYL 
MYRISTYL 



ASH GLYCOSYLATION 
ASH~GL YCOS YLAT I ON 
ASH GL YCOS YLAT I ON 

camp phospho site 
camp~phospho~site 
camp - phospho - site 
pkc phospho site 
pkc _ phospho - site 
pkc1phospho - site 
pkc phospho-site 

PKC>HOSPHO_SITE 
PKC PHOSPHO SITE 
PKC - PHOSPHO~SITE 
PKC~PHOSPHO~SITE 
CK2 - PHOSPHO _ SITE 
CK2 PHOSPHO - SITE 
CK22PHOSPHO - SITE 
CK2 PHOSPHO~SITE 
CK2 - PHOSPHO - SITE 
CK2~PHOSFHO~SITE 
TYR PHOSPHO SITE 
MYRISTYL 



PDOC00001 
PDOC00001 
POOC 00001 
POOC00004 
POOC 00 00 4 
POOC 00 00 4 
POOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00005 
P DOC 00 00 6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 

pDccooooa 
poocooooa 
PDccooooa 
PDccooooa 
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PSOOO08 
PS00008 

psooooe 
psooooa 

PS01010 



227->233 
30O->306 
4<7-><53 
470-><76 
195->207 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 

SCP AGS PR1 SC7 2 



PDOC0000B 
PDOC000OB 
PDOC00008 
PDOC0O0O8 
PDOC00772 



PI am for DKrzphtes3_<tH . 1 



hmm_name SCP-iike extracellular Proteins 

KMM * PQDEQDEWLNlcHNOFROQVGRGLETRGNPGPQPPAsNHnPMVWHDELAt 
p + + + E+L HN +R QV P ASNM M+W+DEL + 

Query 52 PREDKEEILMLHNKLRGQVQ PQASNMEYMTWDDELEK 

HMM 1 AQnWANQCi TOHHDCCWNH 3f> Y PYGQNIAWWS sTANn PWnWs 3MIQMW Y 

A WA+QCI +H *+ + S GQN+ + + ++++ +Q+WY 
Query B9 SAAAWASQCIfcSHGPTSlLVSI--- GQNLGAHWG— RYRSPGFHVQSWY 

HUM MEv IcD YN YNWNTC kGG NHFmVCGHYTQNVWRnT f r IGCGRY IC YC 

+EVKDY Y + + +C HYTQ+VW+ T +IGC+ C+ 

Query 133 OEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTCRK 

HMM NNNWrKPDPWKhkWYYVCNYCPpGNYmN* 

+ U + W+ +Y VCNY P+GN+++ 
Query 1G3 MTVW-- GEVWENAVYFVCNYSPKGNWIG 208 
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DKFZphte33_4f 17 



group: testes derived 

DKrZphtes3_4fl7 encodes a novel 656 amino acid protein with weak similarity to methyl-CpG- 
binding proteins. 

Hethylation at the DNA sequence 5*-CpG is required for mammalian development. Methyl-CpC- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does not contain such a raotife. 
No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of testia-specif ic 
genes. 



similarity to methyl -CpG-binding protein 

extension of HS557771/HSZ7B337, 

there are some differences to these sequences 

Sequenced by ACOWA 

Locus: /map-" 18" 

Insert length: 2320 bp 

Poly A stretch at pos. 22(6, polyadenylation signal at pos. 2251 



1 GGCAGGTTCG CGGGTCGCTG GCGGGGGTCG TGAGGGAGTG CGCCGCGAGC 
51 GGAGATATGG AGGGAGATGG TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
101 GGACAGCAAG TCCGAGAATG GGGAGAATGC GCCCATCTAC TGCATCTGCC 
151 GCAAACCGGA CATCAACTGC TTCATGATCG GGTGTGACAA CTGCAATGAG 
201 TGGTTCCATG GGGACTGCAT CCGGATCACT GAGAAGATGG CCAAGGCCAT 
251 CCGGGAGTGG TACTGTCGGC ACTGCAGAGA GAAAGACCCC AAGCTAGAGA 
301 TTCGCTATCG GCACAAGAAG TCACGGGAGC GGGATGGCAA TGAGCGGGAC 
351 AGCAGTGAGC CCCGGGATGA GGCTGGAGGG CGCAAGAGGC CTGTCCCTGA 
401 TCCAGACCTG CAGCGCCGGG CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
451 TTGCTCGGGG CTCTGCTTCG CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
501 GCCACACCCA GCCAGCATCA CCAGCAGCAG CAGCAGCAGA TCAAACGGTC 
551 AGCCCGCATG TGTGGTGAGT GTGAGGCATG TCGGCGCACT GAGGACTGTG 
601 GTCACTGTGA TTTCTGTCGG GACATGAAGA AGTTCGGGGG CCCCAACAAG 
651 ATCCGGCAGA AGTGCCGGCT GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
701 GTACAAGTAC TTCCCTTCCT CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
751 TGCCAAGGCC CCGCCGGCCA CTGCCCACCC AACAGCAGCC ACAGCCATCA 
801 CAGAAGTTAG GGCGCATCCG TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
851 AGTCAAGGAG CCTCCTGAGG CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
901 AGGACCTACC TCTGGATCCT GACCTGTATC AGGACTTCTG TGCAGGGGCC 
951 TTTGATGACC ATGGCCTGCC CTGGATGAGC GACACAGAAG AGTCCCCATT 
1001 CCTGGACCCC GCGCTGCGGA AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
1051 GTCGGGAGAA GAAGTCTGAG AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
1101 CGGCAGAAGC AGAAGCACAA GGATAAATGG AAACACCCAG AGAGGGCTGA 
1151 TGCCAAGGAC CCTGCGTCAC TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
1201 GCCCCGCCCA GCCCAGCTCC AAGTATTGCT CAGATGACTG TGGCATGAAG 
1251 CTGGCAGCCA ACCGCATCTA CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
1301 GCAGCAGAGC CCTTGCATTG CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 
1351 GCATTCGCCG AGAGCAGCAG AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
1401 CGCCGATTCC ATGAGCTTGA GGCCATCATT CTACGTGCCA AGCAGCAGGC 
1451 TGTGCGCGAG GATGAGGAGA GCAACGAGGG TGACAGTGAT GACACAGACC 
1501 TGCAGATCTT CTGTGTTTCC TGTGGGCACC CCATCAACCC ACGTGTTGCC 
1551 TTGCGCCACA TGGAGCGCTG CTACGCCAAG TATGAGAGCC AGACGTCCTT 
1601 TGGGTCCATG TACCCCACAC GCATTGAAGG GGCCACACGA CTCTTCTGTG 
1651 ATGTGTATAA TCCTCAGAGC AAAACATACT GTAAGCGGCT CCAGGTGCTG 
1701 TGCCCCGAGC ACTCACGGGA CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
1751 GTGCCCCCTT GTACGTGATG TCTTTGAGCT CACGGGTGAC TTCTGCCGCC 
1801 TGCCCAAGCG CCAGTGCAAT CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
1851 GCGGAAGTGG ACTTGGAGCG CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
1901 GTTTGAGCAG GAGCGCAATG TGCGCACAGC CATGACAAAC CGCGCGGGAT 
1951 TGCTGGCCCT GATGCTGCAC CAGACGATCC AGCACGATCC CCTCACTACC 
2001 GACCTGCGCT CCAGTGCCGA CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
2051 ACCCTGCATT CCAGATGGGG GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
2101 CCACTCATCT GTTTCTCCGG TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
2151 CCCATCTGCC TTTATCACAG GGACTGTCCC CGTCGACATG TTCAGTGCCT 
2201 GGTGGGGCTG CGGAGTCCAC TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
2251 TAATAAAATT TTGAAGAAAC CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA 
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Entry HSS57771 from database EMBLEST: 

Human chromosome 18 clone 2 mRNA sequence. 

Score - 1582, P - 0.0e+00. identities - 1560/1S9B 

Entry HSZ78337 from database EMBLEST: 

H . sapiens mRNA, expressed sequence tag lCRrp5O7H02194 15') 
Score - 6339, P - 9.0e-2Bl, identities - 1307/1347 

Entry K5095149 from database EMBL: 
human STS HI -6941. 
Score - 1210, P - 2.2e-49, identities - 246/251 



Medline entries 



96449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins. 

9824997: 

Gene silencing by methyl-CpG-binding proteins. 



Peptide information for frame 3 



ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 



1 HEGDGSOPEP PDAGEDSKSE NGENAPIYCI 
51 HGDCIRITEK MAKAIREWYC RECREKDPKL 
101 EPRDEGGGRK RPVPDPDLQR RAGSGTGVCA 
151 PSQHHQQQQQ QIKRSARMCG ECEACRRTED 
201 QKCRLRQCQL RARESYKYFP SSLSPVTPSE 
251 LGRIREDEGA VASSTVKEPP EATATPEPLS 
301 DHGLPWMSDT EESPFLDPAL RKRAVKVKHV 
351 KQKHKOKWKH PERADAKDPA SLPQCLGPGC 
401 AWRIYEILPQ RIQQWQQSPC IAEEHGKKLL 
451 FHELEAIILR AKQQAVREDE ESNEGDSDDT 
501 HMERCYAKYE SQTSFGSMYP TRIEGATRLF 
551 EHSRDPKVPA DEVCGCPLVR OVFELTGDFC 
601 VDLERVRVWY KLDELFEQER HVRTAMTNRA 
651 RSSAOR 

BLASTP hits 

Mo BLASTP hits available 

Alert BLASTP hits for DKFZphtea3_4f 17, frame 3 

TR£MBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans coamid 
F52B11, N - 2, Score - 316, P - 8.8e-27 

TREMBL:HSAB2331 1 gene: "KIAA0333"; Human mRNA for KIAA0333 gene, 
partial cds., lT- 2, Score - 163, P - 2.8e-13 

TREMBL:SPCC594_5 gene: "SPCC594 . 05c" ; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.pombe 
chromosome III cosmid c594., M » 3, Score » 16B, P - 3.6e-12 

TREMBL:AF072240_1 gene: "Mbdl"; product: "methyl-CpG binding protein 
MflDl"; Kus rausculus methyl-CpG binding protein MBD1 (Mbdl) mRNA, 
complete cds., N ■ 2, Score - 189, P - 7.6e-ll 



>TREMBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosraid F52B11 
Length - 523 

HSPs: 

Score - 316 {47.4 bits). Expect - 8.8e-27, Sum P(2) - 8.8e-27 
Identities - 100/336 (29*), Positives - 167/336 (49*) 



CRKPOINCFM IGCDNCNEWF 
EIRYRHKKSR ERDGNERDSS 
MLARGSASPH KSSPQPLVAT 
CGHCOFCRDM KKFGGPNKIR 
SLPRPRRPLP TQQQP0PSQK 
DEDLPLDPDL YQDFCAGAFD 
KRREKKSEKK KEERYKRHRQ 
VRPAQPSSKY C5DDCGMKLA 
ERIRREQQSA RTRLQEMERR 
DLQIFCVSCG HPINPRVALR 
CDVYNPQSKT YCKRLQVLCP 
RLPKRQCNRH YCWEKLRRAE 
GLLALMLHQT IQHDPLTTDL 
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Query: 333 REKKSSKKKEERYKRHRQ-KQKHKDKWKHPERADAKDPASLP-QCLGPGCVRPAQPSSKY 390 

+++K+ E Y +R +Q+ D + + +A +P P OCL P C+ ++ SKY 
Sbjct: 11B QQRKANIINEROYVPNRPTROQSADLRRKRTOLNA-EPDKHPRQCLMPNCIYESRIDSKY 176 

Query: 391 CSDDCGHKLAANRI YEI LPQRIQQW QQSPCIAEEHGKKLLERI RREQQS ART RLQ 445 

CSD+CG +LA Rt EILP R +Q+ P E* K +1 RE Q + 

Sbjct: 177 CSDECGKEIAPJ1RLTEILPNRCKQYFFEGPSGGPRSLEDEIKPKRAKIHREVQKLTESEK 236 

Query: 446 EMERRFKEL-EAIILRAKQQAVREDEESNEGOSDDTDLQIFCVSCGHPINPRVAL-RHME 503 

M + + L E I + K Q + +E D +L C+ CG P P + +H+E 

Sbjct: 237 NMMAFLNKLVEFIKTQLKLQPLGTEERY DDNLYECCIVCGLPDIPLLKYTKHIE 290 

Query: 504 RCYAKYESQTSFGSMYPTR1EGATRLFCDVYNPQSKTYCKRLQVLCPEHSRDPKVPADEV 563 

C+A+ E SFG+ P + +C+ Y+ ♦+ ++CKRL* LCPEH •» +V 

Sbjct: 291 LCWABSEKAISFGA— PEK--NKDMFYCEKYDSRTNSFCKRLK5LCPEHRKLGDEQHLKV 346 

Query: 564 CGCP - LVRDVFELTGDF CRLPKRQCNRHYCWEKLRRAEVDLERVR 607 

CG P V + ♦ E* F CR K C + + H> U R + + LE+ 

Sbjct: 347 CGYPKKWEDGMIETAKTVSELIEMEDPFGEEGCRTKKDACHKHHKWIPSLRGTIELEQAC 406 

Query: 60 B VWYKLDELFEQ- - ERHVRTAMTNRAGLLALMLHQT I QH DPLTTDLRSSA 654 

++ K+ EL + + N T A L++M+H+ + + LR+ A 

Sbjct: 407 LFQKMYELCHEMHKLNAHAEWTTNA — LSIMMHKQPSTEKCSFFLRNFA 453 

Score - 53 (8.0 bits). Expect - B.Se-27, Sun P(2) - B.Be-27 
Identities - 24/100 (24%), Positives • 41/100 (41%) 

Query: 169 CGECEACRRTEDCGHCDFCR DMKK-FGGPNKIRQKCRLRQCQLRARESYKYFFSS 222 

C C C +*CG C CR DM+K F +K ♦ RQ + + + 

Sbjct: 17 CMHCIRCNDEKNCGTCWPCRHGKTCDMRKCFSAKRLYKEKVK-RQTDEHLK-AIMAKTAQ 74 

Query: 223 LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 264 

+ + P P+ +QQ + +K GR ♦ G A** 
Sbjct: 75 REAAHQAATTTAPSAPWI EQQVE-KKKRGRKKGSGNGGAAAA 116 

Score - 48 (7.2 bits). Expect - 2.9e-26, Sum P(2) - 2.9e-26 
Identities - 13/39 (33t), Positives - X9/39 (48t) 

Query: 179 EDCGHCOFCROHKKFGG — PNKIRQKCRLRQCQLRARESY 216 

EC+CCDKG P + ♦ C +R+C A* Y 
Sbjct: 15 ERCMNCIRCNDEKNCGTCWPCRNGKTCDMRKC-FSAKRLY 53 

Pedant information for DKFZphtes3_4f 17, frame 3 

Report for DKF2phtes3_4f 17 . 3 

( LENGTH] 656 

(KWj 75711.71 

tpl) 8.61 

IHOMOL] TREMBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elogans cosraid F52B11 3e-25 

[FUNCAT] 99 unclassified proteins IS. cerevisiae, YPL138c) 3e-10 

( FUNCAT] 04.05.01.04 transcriptional control IS. cerevisiae, YNL097cl 2e-04 

t PROS I TE | MYRISTYL 6 

[PROSITE] AHIDATIOW 2 

[PROSITEI CK2 PHOSPHO SITE 8 

[ PROSITE] TYR~PHOSPHO~SITE 3 

(PROSITE) GLYCOSAMINOGLYCAN 1 

t PROSITE) PKC PHOSPHO SITE 9 

(KW) All"Alpha 

(KW] LOtTcOMPLEXITX 18.75 % 

[KWJ COILED_COIL 4 . 57 » 

SEQ MEGDGSDPEPPDAGEDSFSEMGEHAPIYCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 

SEG 

PRO cccccecccccccccccccccccccceeeeeeccccceeeeecccccccccecchhhhhh 
COILS 

SEQ HAKAIREWYCRECREKDPKLEIRYRHKKSRERDGNERDSSEPRDEGGGRKRPVFDPDLQR 

SEG 

PRO hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

COILS 

SEQ RAGSGTGVGAMLARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARHCGECEACRRTED 

SEG xxxxxxxxx 

PRD cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 
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SEQ CGHCDFCRDMKKFGGPNKIRQKCR1RQ£QLRAR£SYKYFPSSLSPVTPSESLPRPRRPLP 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 

COILS 

S EQ TQQQPQ PSQKLG R I REDEGAVASSTVKE PPEAT ATPEPLSDEDLPLDPDLYQDFCAGA FD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

COILS 

SEQ DH G L PWMS DT E E S P FL D PALRKRA V K VKKV KRREKKS EKKK EER Y K RH RQKQKH K OKW KH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRO c c cccc c cc ccc cc cc c cc c ccc chhhh hh hh h h hhhh htihhh hhh h h h hh h hh h ccc hh 

COILS 

SEQ PERADAKDPASLPOCLGPGCVRPAQPSSKYCSDDCGMKLAANRI YEILPQRIQQWQQSPC 

SEG 

PRO hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 

COILS 

SEQ I AEEHG K K LLER I RREQQS A RT RLQ EME RRFH ELEA 1 1 L RAKQQA V RE DEESN EG D S D DT 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DLQI FCVSCGHPIHPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 

SEG X 

PRD ceeeeeeeccceccccchhhhhhhhhhhhhhcccccccccccccecceeeeeecccccce 

COILS 

SEQ YC KRLQV I£ P E H S R D PK VP ADE VCGC PL V RDV FELTG D FC RL P K RQCN RH YCWEK LRRA E 

SEG 

PRD cchhhhhhhccccccccccce«eeccccchhhhhceccecceccccccchhhhhhhhhhh 

COILS 

SEQ VDLERVRVWYKLDEL FEQERNVRT AMTN RAG LLALMLH QT I QH D PLTTDLRSSADR 

SEG 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 

COILS 



Prosite foe DKFZphtes3_4 fl7 . 3 

PS00002 124-M29 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 58->61 PKC PHOSPHO SITE PDOC00005 

PS 000 05 165->16B PKC~PKOSPHo2siTE PDOC00005 

PS00005 215->218 PKC2PHOSPHO_SITE PDOC00005 

PS00005 249->251 PKC~PHOSPHO SITE PDOC00005 

PS00005 265->268 PKC^PHOSPHO^SITE PDOC00005 

PS00005 337->340 PKC~PHOSPHO_SITE PDOC00005 

PS00005 387->390 PKC PHOSPHO_SITE PDOC00005 

PS00005 439->442 PKC~PHOSPHO_SITE PDCC00003 

PS00OO5 627->630 PKC~PHOSPHO SITE PDOC00O05 

PS00006 6->10 CK2~PH0SPHO~SITE POOC00006 

PS00006 17->21 CK2~PHOSPKCfSITE PDOC00006 

PS00006 227->231 CK2>HOSPKO SITE PDOC00006 

PS00006 265->269 CK2 PHOSPKO~SITE PDOC00006 

PS00006 280->284 CK2~PHOSPHO SITE PDOC00006 

PS00006 30B->312 CK2~PHOSPKO~SrTE PDOC00006 

PS00006 S21->525 CK2~PHOSPKO~SITE PDOCO0006 

PS00006 652->6S6 CK2~PHOSPHO~SITE PDOC00006 

PS00007 339->346 TYR>HOSPKO~SITE PDOC00007 

PS00007" 500->507 TYR PHOSPHO~SITE PDOC00007 

PSO0OO7 211->219 TYR~PHOSPHO~SITE PDOC00007 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 123->129 MYRISTYL PDOC00009 

PS00008 125->131 MYRISTYL PDOC00008 

PS00009 129-M3S MYRISTYL PDOC00008 

PSO0009 259->265 MYRISTYL PDOC0000B 

PS00008 396->402 MYRISTYL ' PDOCO0009 

PS00009 107-M11 AMI OAT ION PDOC00009 

PS00009 425->429 AMI DAT ION PDOC00009 



(No Pfara data available for DKFZphtes3_4f 17 . 3) 
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DKFZphtas3_4f5 



group: signal transduction 

DKrZphtes3_4f5.3encodes a novel 790 amino acid protein similar to beta-transducina. 

The protein contains 3 WD-40 repeats, which are typical (or the beta-tranaducin aubunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTF as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding aite signature is present. The protein is larger (790 amino acids I than the usual 
eukaryotic G-beta transducins (about 340 amino acids). 

The new protein can find application in modulating /bloc king G-protein-dependent pathways. 



similarity to S.pomb* "beta-transducin" 

complete cDNA, EST hits 
complete cds, 

on genomic level encoded by HS313DU, at least ^ exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /raap-"16pl3 .3* 
Insert length: 3166 bp 

No poly A stretch found, no polyadenyiation signal found 



1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 

51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CTGGTTCTCG 

101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC AGGAACCCTG 

151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 

201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 

251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 

301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 

351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 

401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 

451 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 

501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 

551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 

601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG GGACCAAGCA 

651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG ACCCACCCAG 

701 GCTGACCAGG CCAGCCCACC TCACTGACCT CCTGACCCCT GACCTCATCA 

751 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG TGACCACAGC CCTGGGTGGC 

801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATC CTCCCGCCAA 

851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 

901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 

951 AACCTGCGTG TGGGGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 

1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 

1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 

1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 

1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 

1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 

1251 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 

1301 CTCCACCTTT GAGAACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 

1351 ACCGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 

1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 

1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG GAGATGCACT 

1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 

1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 

1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT GAGGAACACC 

1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC ACCCCCACGA CCCCTCCTTC 

1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 

1751 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 

1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT GGCTGCCGAG 

1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 

1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 

1951 TCAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCG CTGGTTTGTG 

2001 GACACAGCTG AGCGTTATGC GCTGGCTGGC CGGCCACTGG CCGAGCTCTG 

2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 

2101 AAACGTGGAC CATGCTGCGG ATCATCTACT GCAGCCCTGG CCTAGTGCCC 

2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 

2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 

2251 AGACGCGGCT GGACCGCAGC AAAGGAGATG CACGGAGCGA CACAGTTCTG 

2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AGGAAACCGA 

2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 
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2401 AGGACGAGCT GTACCTGCTG GATCCGGAAC ACGCGCACCC CGAGGACCCT 
2451 GAGTCCGTGC TGCCGCAGGA GGCCTTTCCG CTGCGCCACG AGATCGTGGA 
2501 CACGCCTCCC GGACCCGAGC ACCTGCAGGA CAAGGCCGAC TCCCCGCACG 
2551 TGAGCGGCAG CGAGGCGGAT GTGGCCTCCC TGGCCCCCGT GGACTCCTCC 
2601 TTCTCGCTCC TGTCTGTCTC ACACGCGCTC TACGACAGCC GCCTGCCGCC 
2651 CGACTTCTTC GGCGTGCTGG TGCGCGACAT GCTGCACTTC TACGCTGAGC 
2701 AGGGCGACGT GCAGATGGCT GTGTCTGTGC TCATCGTCCT GGGTGAACGG 
2751 GTGCGCAAGG ACATCGACGA GCAGACCCAG GAGCACTGGT ACACTTCCTA 
2601 CATCGACCTG CTGCAGCGCT TCCGCCTCTG GAACGTGTCC AACGAGGTGG 
2B51 TCAAGCTGAG CACCAGCCGC GCCGTCAGCT GCCTCAACCA GGCCTCCACC 
2901 ACCCTGCACG TCAACTGCAG CCACTGCAAG CGGCCCATGA GCACCCGGGG 
2951 CTGGGTCTGC GACAGGTGCC ACCGCTGCGC CAGCATGTGT GCCGTCTGCC 
3001 ACCACGTAGT CAAGGGTCTC TTCGTGTGGT GCCAGGGCTG CAGCCACGGC 
3051 GGCCACCTGC AGCACATCAT CAAGTCGCTG GAAGGCAGCT CCCACTGTCC 
3101 CGCAGGCTGC GGCCACCTCT GCGAGTACTC CTGACGGGGC ATCTGCTGGG 
3151 CTTGCCCGGG CGGCCG 



BLAST Results 



Entry HS313D11 from database EMBL: 

Human DMA sequence from cosmid 313011 from a con tig on the short am of 
chromosome 16. Contains ESTs, ST3 and CpG islands. 
Score - 6238, P - 0.0e+00, identities - 1318/1391 



Medline entries 



No Kedlin* entry 



Peptide information for frame 3 



ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 



1 MEKMSBVTTA LGGSVLTGRT MHCHLOAPAN AISVCRDAAQ VWAGRSIFK / 
51 IYAIEEEQFV EKLNLRVGRK PSLNLSCADV VWHQMDENLL ATAATNGWV 
101 TWNLGRPSRN KQDQLFTEHK RTVNKVCFTtP TEAHVLLSGS QDGFMKCFDL 
151 RRKDSVSTFS GQSESVRDVQ FSIRDYFTFA STFENGNVQL WDIRRPDRCE 
201 RMFTAHNGPV FCCDWHPEDR GV.'LATGGRDK MVKVWDMTTH RAKEMHCVQT 
251 IASVARVKWR PECRHHLATC SMMVDHNIYV WDVRRPFVPA AMFEEHRDVT 
301 TCIAWRHFHD PSFLLSGSKD SSLCQHLFRD ASQPVERANP EGLCYGLFGD 
351 LAFAAKESLV AAESGRKPYT GDRRHPIFFK RKLDPAEPFA GLASS ALSVF 
401 ETEPGGGGMR HFVDTAERYA LAGRPLAELC DKNAKVAREL GRNQVAQTWT 
451 MLRIIYCSPG LVPTANLNHS VGKGGSCGLP LMNSFNLKDM APGLGSETRL 
501 DRSKGDARSD TVLLDSSATL ITNEDNEETE GSDVPADYLL G0VEGEEDEL 
551 YLLDPEHAHP EDPECVLPQE AFPLRHEIVD TPPGPEHLQD KADSPHVSGS 
601 EADVASLAPV DSSFSLLSVS HALYDSRLPP DFFGVLVRDM LHFYAEQGDV 
651 QMAVSVLIVL GERVRKDIDE QTQEHWYTSY IDLLQRFRLW NVSNEWKLS 
701 TSRAVSCLNQ ASTTLHVMCS HCKRPMSSRG WVCDRCHRCA SHCAVCHHW 
751 KGLFVWCQGC SHGGHLQHIM KWLECSSHCP AGCCHLCEYS 

BLASTP hits 

Entry YDSBSCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KD TRF-ASP REPEATS CONTAINING PROTEIN C4F8.11 IN 
CHROMOSOME I. >TREHBL:SPAC4FS_11 gene: "SPAC4F8.il"; product: 
"beta-transducin"; S.pombe chromosome I cosmid c4F8. 
Score - 404, P - 3.0e-42, identities - 169/639, positives - 278/639 

Entry PEX7 HUMAN from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) . 
>TREMBL:HSO76560 1 gene: "Pex7"; product: "peroxisome targeting signal 
2 receptor"; Human peroxisome targeting signal 2 receptor (Pex7) mRNA, 
complete cds. >TREMBL:HSU38871_1 gene: "HaPEX7"; product: "HsPex7p"; 
Human HsPex7p (HSPEX7) mRNA, complete cds. 

Score - 220, P - l.le-15, identities - 62/244, positives - 107/244 
Entry PEX7 MOUSE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR ( PTS2 RECEPTOR) (PEROXIN-7). 
>TREMBL:MMU69171_1 product: "peroxisomal PTS2 receptor"; Mus ousculus 
peroxisomal PTS2~receptor mRNA, complete cds. 

Score - 214, P - 5.3e-15, identities - 60/240, positives - 106/240 



881 



WO 01/12659 



PCT/IBOO/01496 



Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17 . 7" ; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score - 232, P - 3.4e-14, identities - 69/260, positives - 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138c - yeast (Seccnaromyces cerevisiee) 
>TREMBL:SCYOL138C 1 S.cerevisiae chromosome XV reading frame ORF 
YOU3BC 

Score - 136, P - 2.3e-13, identities - 24/77, positives - 44/77 



Alert BLASTP hits for DKFZphtes3_4f 5, frame 3 
Ho Alert BLASTP hits found 

Pedant information for DKFZphtes3_4f5, frame 3 

Report for DKFZphtes3_4f5. 3 



[LENGTH) 790 

IMW) 6B207.10 

[pi] 6.05 

[HOMOL] SWISSPROT:YDSB SCHPO HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 

C4F8.11 IN CHROMOSOME I. 9e-44 



( FUNCAT ] 
1 FUNCAT] 
I FUNCAT ) 
( FUNCAT J 
I FUNCAT] 
1 FUNCAT 1 
( FUNCAT ] 
3e-10 
( FUNCAT ) 
TAF90 



99 unclassified proteins (S. cerevisiae, YOL138c] 5e-16 

10.04.09 regulation of g-protein activity (S. cerevisiae, YBR195c] 3e-ll 

06.10 assembly of protein complexes [5. cerevisiae, YBR195c) 3e-ll 

03.16 dna synthesis and replication [S. cerevisiae, YBR195c) 3e-ll 

09.13 biogenesis of chromosome structure IS. cerevisiae, YBR195c) 3e-ll 

04.05.01.07 chromatin modification [S. cerevisiae, YBR195c) 3e-ll 

30.10 nuclear organization (S. cerevisiae, YCR072c beta-transducin family] 



(FUNCAT) 
[ FUNCAT) 
Y0L195w) 2e-07 
[ FUNCAT J 
2e-07 
( FUNCAT J 
[ FUNCAT ) 
4e-07 
( FUNCAT) 
[ FUNCAT} 
[ FUNCAT ] 
[ FUNCAT ) 
I FUNCAT) 
( FUNCAT J 
I FUNCAT] 
[FUNCAT] 
(rUNCAT) 
( FUNCAT] 
I FUNCAT 1 
le-05 
[ FUNCAT 1 
palmitylation, 

I FUNCAT ] 

[SCOP] 

[PIRKW) 

(PIRKW) 

(PIRKW) 

(PIRKW) 

( PIRKW) 

( PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

(SUPFAM] 

(SUPFAM) 

(SUPFAM) 

(PROSITE] 

iPROSITE] 

I PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE) 



04.05.01.01 general transcription activities 
TFIID subunit] 9e-09 



(S. cerevisiae, YBR19Bc 



04.01.04 rrna processing (S. cerevisiae, YLLOllw] la-07 

30.09 organization of intracellular transport vesicles {S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL195wJ 



30.19 peroxisomal organization (S. cerevisiae, YDR142c) 4e-07 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YDR142c] 

08.10 peroxisomal transport (S. cerevisiae, YOR142cJ 4e-07 
08.01 nuclear transport IB. cerevisiae, YER107cj 4e-07 

04.07 rna transport [S. cerevisiae, YER107c] 4e-07 

30.03 organization of cytoplasm IS. cerevisiae, YER107c) 4e-07 
03.22 cell cycle control and mitosis (S. cerevisiae, YGL003c) 5e-07 
06.13 proteolysis [S. cerevisiae, YGLO03e] 3e-07 
04.05.01.04 transcriptional control IS. cerevisiae, YCR084C) 8e-07 
04 .05.03 rarna processing (splicing) [S cerevisiae, YFR178w) le-06 
03.13 meiosis (S. cerevisiae, YLR129w) 3e-06 

03.25 cytokinesis (S. cerevisiae, YCR057c] le-05 

03.04 budding, cell polarity and filament formation (S. cerevisiae, YCR057c) 

06.07 protein modification (glycolsylation, acylation, myr istylation, 

farnesylation and processing) IS. cerevisiae, YEL056w] 2e-04 

30.04 organization of cytoskeleton (S. cerevisiae, YOR272w] 6e-04 

dlgotb_ 2.46.3.1.1 betal-subunit of the signal-transducing 5e-06 

duplication 7e-10 

signal transduction 7e-0B 

peroxisome 9e-06 

he te rot rimer 7e-08 

GTP binding 7e-08 

peroxisome biogenesis 9e-06 

transmembrane protein le-U 

MSI1 protein 7 e -10 

WD repeat homology le-14 

GTP-binding regulatory protein beta chain 7e-08 

PRL1 protein 3s-0B 

coatomer complex beta' chain le-06 

CYTOCHROME^ 1 

WD REPEATS 3 

MYRI STYL 10 

AMI DAT I ON 2 

CAMP PHOSPHO SITE 2 

CK2 PHOSPHO SITE 11 
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IPROSITE) 

(PROSITE) 

(PBOSITE) 

[PFAM) 

tKW] 

IKW) 

tKW] 



TYR_PHOSPHO SITE 1 
PRC PHOSPHO~SITE 7 
ASN~GLYCOSYLATION A 
wd domain, G-b«ta repeats 
All Beta 



3D ~ 

LOW_COHPLEXITY 2.28 % 



SEO ME KMS R VTT A LGGS VLTGRTM HCH L OA P AN A I S VC RDAAQW V AG RSIFKIYAIEEEQFV 

SEG 

IgOtB , 

SEQ E KLN LRVG RX P S LN LSC ADWWHQKDEN L LAT AAT NGWVTW N LG R PS RNKQDQ L FT E H K 

SEG 

IgotB TTCEEEEEETTTEEEEEET-TTTCEEE — EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVOFSIROYFTrA 

SEG 

IgOtB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTTEEEEEECBTTCCEEEEEETTTTTEEEE 

SEQ ST FENGNVQL W D I RRP DRC E RM FT AHNG PV FCC DW HP ED RGW LATGGRDKMV K VWDMTT H 

SEG 

IgOtB E-ETTTEEEEEETTTTEEEE-EEECCCCCEEEEEE-TTTTCCEEEEETTTEEEEEC 

SEO RAKEKHCVQTIASVARVKWRPECRHHLATCSMMVDHNI YVWDVRRPFVPAAMFEEHROVT 

SEG 

IgotB 

SEQ TG I A W RH PH D P S FL L SG S KQS S LC QHL FRDASQ P VERAN PEG LC YG LFGD LA FAAK E S L V 

SEG ' 

IgotB 

SEQ AAESGRKPYTG DRRH P I FFK RK LD P A£ P FAG LAS SALS V FET E PGGGGMRW FVDT AER YA 

SEG 

IgotB 

SEQ LAG R P LA ELC DHN AK V ARE LG RNQV AQT WTMLRI I YCS PG LV FT AN LN HS VG (CGG SCG L P 

SEG 

IgotB 

SEQ LffflSFMLKDMAPGLGSETRLDRSXGDARSDTVLLDSSATLITNEDNEETEGSOVPAOYLL 

SEG i"«n 

IgotB 

SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPOEAFPLRHEIVDTPPCPEHLQDKADSPHVSGS 

SEG xxxxxxxxxxxxxx 

IgotB 

SEQ EADVASLAPVDSS FSLLS VSKALYDSRL PPD FFGVLVRDMLH FYAEQGDVQKAVS VL I VL 

SEG 

IgotB 

SEQ GERVRKDI DEQTQEHHYTSY I DLLQRFRLlfNVS N EW KLST S RA VSCLNQ AS TTLH VN C S 

SEG 

IgotB 

SEQ HCKRPMSSRGWVCDRCH RCASMC AVCH H WKGL FVWCQGC S HGG H LQH I MKWLEGS S H C P 

SEG 

IgotB 

SEQ AGCGHLCEYS 

SEG 

IgotB 



Proaite for DKFZphte3 3_4f5 . 3 



PS0O0O1 
PS00001 
PS00001 
PS0O001 
PS0O004 
P50O004 
PS0O005 
PS0O005 
PS0O005 
PS00005 
PS0O005 
PSOO005 



7«->7fl 
468->472 
691->695 
719->722 

69->73 
152->156 

17->20 
165->168 
172->175 
239->242 
36«->367 
701->704 



ASWGLYCOSYLATION 
ASN GLYCOSYLATION 
ASN~GLYCOS YLAT I ON 
ASN~GI,YCOSYLATION 
CAMP PHOSPHO SITE 
CAMP~PH0SPHO~SITE 
PRC PHOSPHO SITE 
PKC~PHOSPHO _ SITE 
PKC"PH0SPH0~SITE 
PKC - PHOSPHO"SITE 
PKC _ PHOSPHO~SITE 
PKC~PHOSPHO~SITE 



PDOC00001 
PDOC00O01 
PDOCO0OO1 
PDOC00001 
PDOC00O04 
PDOC00004 
PDOC0000S 
PDOC00005 
PDOC00O05 
PDOC00005 
PDOC00O05 
PDOC00O05 
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PS00005 


727->730 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


76->80 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00OO6 


165->169 


CK2 PHOSPHO" 


"site 


PDOC00006 


PSO0OO6 


172->176 


CK2 PHOSPHO" 


-SITE 


PDOC00006 


PS00006 


iei->ies 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


398O402 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


498->302 


CK2~PH0SPK0 


SITE 


PDCC00006 


PS00006 


503->507 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00OO6 


522->526 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00OO6 


598->602 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


600->604 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


679->683 


CK2 PHOSPHO" 


"site 


PDOcooooe 


PS00007 


337->346 


TYR~PHOSPHO 


"site 


PDOC00007 


psooooa 


13->19 


MYRISTYL 




PDOCOOOOB 


psooooe 


97->l03 


MYRISTYL 




PDOCOOOOB 


psooooa 


139-M45 


MYRISTYL 




PDOCOOOOB 


PS00008 


161->167 


MYRrSTYL 




PDOCOOOOB 


P500008 


317->323 


MYRISTYL 




PDOCOOOOB 


psooooa 


342->348 


MYRISTYL 




PDOC00008 


PSOOOOB 


391->397 


MYRISTYL 




PDOCOOOOB 


psooooa 


460->466 


MYRISTYL 




PDOCOOOOB 


psooooe 


474->480 


MYRISTYL 




PDOCOOOOB 


psooooa 


7S9->765 


MYRISTYL 




PDOCOOOOB 


PSQ0009 


67->71 


AMI DAT ION 




PDOC00009 


PS00009 


364->36B 


AMI DATION 




PDOC00009 


PS00190 


743->749 


CYTOCHROME C 


PDOCQ0169 


PS00678 


90->105 


WD^REPEATS 




PDOC00574 


PS00678 


223->238 


WD~RE PEATS 




PDOC00574 


PS00678 


269->284 


WD~REP£ATS 




PDOC00574 



Pfam for DKrzphtes3_4r5 . 3 



KMM_NAME HD domain, G-beta repeats 

KMM •MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

+♦ HN-n-V C* ++P+ R +++G++D+ +++WD 
Query 203 FTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWD 236 
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DKFZphtes3_4h6 



group: intracellular transport/trafficking 

DKFZphtes3_4h6 encodes a novel 622 amino acid protein with strong similarity to the kinesin 
light chain. 

Kinesin is a microtubule -based motor protein that pulls vesicles or organelles towards the 
plus end oC microtubules. Structural changes in the protein that drive motility are coupled to 
ATP binding and hydrolysis. The novel protein is similar to kinesin light chain, which is part 
of the functional kinesin holoenzyme tetrameric protein. The light chain has been proposed to 
function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
of the heavy chain. The novel protein contains two kinesin light chain repeats and one RGD 
cell -attachment site. 

The novel kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 



strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2992 bp 

Poly A stretch at pos. 2914, polyadenylation signal at pos. 2893 



1 GGCGCGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 
51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 
101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 
151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 
201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 
251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 
301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 
351 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 
401 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GCCGCAGGTG CGGCGTCTGG 
451 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCCAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
601 ACGAGGAGAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTGTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
701 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CGACACTAAA 
1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 
1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG AGAAGGTCCT GGGCAAGTTT 
1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 
1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 
1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 
1251 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 
1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC T CAT GAG AAA GAGTTTGGCT 
1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 
1401 GAAAGCAAGG AT AAGCGCC G GGACAGCGCC CCCTATGGGG AATACGGCAG 
1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 
1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 
1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 
1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 
1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 
1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 
1751 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 
1B01 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 
1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 
1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 
1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 
2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 
2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 
2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 
2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 
2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 
2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 
2301 CCTCCCCAGA CCCCAGAGCC AAGAACACTA AGCACTCGCC GGCCCTTCGG 
2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 
2401 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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2451 GGCCTCCCCT CGTCCCTCTT CTAGTGGTAC CGCCCAGCCC TTAATCACCC 

2501 CCATTCCGTG CGGTGGTATC TCCCAGGCTC TACATTCTCG GGAGCGGCGC 

2551 CTCCCAAGGG GGTCCTGGGA CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 

2601 ATGCGTCCTA CCCGCGCCAT CGCCCCGTGG CCCAGGACGG GGACCTCCCC 

2«51 TTAGTCCGTC CTCCCACCGC CGGGCCCTGC CCCGCATCCC GGCCTTATGC 

2701 ACTGCCCCTC CCACCCGGCC CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 

2751 CACCCCCCAC CGAGCCATCC TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 

2801 TCGCGAGGGG CGGCGACGGT CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 

2851 GCGGGTGAGG CGCCTGCTCT CTATTTTCAG ATGTTGCTGT AGAAATAAAG 

2901 ACGGTTTAAA TCTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



MO BLAST result 



Medline entries 



9828B268: 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 



Peptide information for fraoe 3 



ORF from 144 bp to 2009 bp; peptide length: 622 
Category; strong similarity to known protein 
Prosite motifs: RGD (502-505) 
KINESIN LIGHT (223-265) 
KINESIN~LIGHT (265-307) 



1 MAMMVFPBEE KLSQDEIVLG TKAVIQGLET LRGEHRALLA PLVAPEAGEA 

51 EPG5QERCIL LRRSLEAIEL CLCEAQVILA LSSHLGAVES EKQKLRAQVR 

101 RLVQENQWLR EELAGTQQKL QRSEQAVAQL EEEKQHLLFM SQIRKLDEDA 

151 SPNEEKGDVP KDTLDDLFPN EDEQSPAPSP GGGDVSGQHG GYEIFARLRT 

201 LHNLVIQYAS QGRYEVAVPL CKQALEDLEK TSGH DHPDVA TMLNILALVY 

251 ROQNKYKEAA HLLNDALAIR EKTLGKDHPA VAATLNNLAV LYGKRGKYKE 

301 AEPLCKRALE IREKVXCKFH PDVAKQLSNL ALLCQNQGKA EEVEYYYRRA 

351 LEIYATRLGP DDPHVAJCTKN NLASCYLKQG KYQDAETLYK EILTRAHEKE 

401 FGSVNGDNKP IWMHAEEREE SKDKRRDSAP YGEYGSWYKA CKVDSPTVMT 

451 TLRSLGALYR RQGKLEAAHT LEDCASRNRK OGLDPASQTK WELLKDGSG 

501 RRGDRRSSRD HAGGAGPRSE SDLEDVGPTA EWKGDGSGSL RR3GSFGKLR 

551 DALRRSSEML VKKLQGGTPQ EPPNPRHKRA SSLH FLNKSV EEPTQPGGTG 

601 LSDSRTLSSS SMDLSRRSSL VG 

BLASTP hits 



no BLASTP hits available 

Alert BLASTP hits for DKFZphtts3_4h6, frame 3 

TREMBL: AF055666 1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) riRNA, complete cds., N - 1, Score 
- 2624, P - 4e-294 

PIR: 153013 kinesin light chain - human, N - 1, Score - 1927, P - 
4.5e-199 

PIR:C41539 kinesin light chain C - rat, N - 1, Score - 1919, P - 
3.2e-198 

SWISSPROTiKNLC RAT KINESIN LIGHT CHAIN (KLC) . , H - 1, Score - 1919, P - 
3.2e-198 



>TREMBL:AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Kiel) raRNA, complete cds. 
Length - 599 

HSPs: 
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Score - 2B24 (423.7 bits), Expect - 4.0e-294, P - 4.0e-294 
Identities - 558/598 (93ti, Positives - 572/598 (954) 



Query: 



1 MAMMVFPREEKLSQDErVLGTKAVIOGLETLftGEH(lA.LLAPLVAPEAGEAEPGSQEBClL 
HA MV P REE KLSQDE I V LCTKA V I QGL ET LRG EH RALLA P L + EAGEAEPGSQERC+L 
1 MATMVLPBEEKLSQDEIVLGTKAVIQGLETLRGEHRALUAPLASHEAGEAEPGSQERCLL 



60 



Sbjct: 



GO 



Query: 61 LRRSLEA I ELG LGEAQV I LALS S H LGA VES EKQK L RAQVRRL VQENQWL REE LAGTQOK L 120 

LRRSLEAIELGLGEA0V11ALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 
Sbjct: 61 LRRS L EA I ELG LGEAQ" I LAL S SHLGA VES EKQK LRAQVRRLVQENQWL REE LAGTQQK L 120 

Query: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 190 

QRS E QA V AQLEEE KQH LL FM SG I RK L DE P EEKGDVPKD+LDOLFPHEDEQSPAPSP 
Sbjct: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDE-HLPQEEKGDVPKDSLDDLFPKEDEQSPAPSP 179 

Query: 161 GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 

GGGDV+ QHGG YE I PAR LRT LH NL VI QY ASQG RY E V A V PLC KQALEDLEKT SG H DH P DV A 
Sbjct: 180 GGGDVAAQHGGYEIPARLRTl^NLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 239 

Query: 241 TMLKILALVYRDQNKrKEAAHLLHDALAIREKTLGKDHPAVAATLNHLAVLYGKRGKYKE 300 

TMLNILALVYRDQNKYK+AAHLLNDALAIREKTLGKDHPAVAATLNKLAVLYGKRGKYKE 
Sbjct: 240 THL N I LA L V YR DQNK Y K DAAH L LNQALA I REKTLGK DH F A V AAT LNN LA V L YG K RGK Y K E 299 

Query: 301 A E P LC KRALE I REK V LG K rH PD VAKQLS N LA L LCQHQGKA EE V E Y Y YRRAL EIYATRLGP 360 

AEPLCKRALEIREKVLGKFH PDVAKQLSNLALLCQNQGKAEEVEYYYRRALEI YATRLGP 
Sbjct: 300 AEPLCK RALE I REKVLGKFHPDVAKQLSNLALLCQNQCKAEEVEYYYRRALE I YATRLGP 359 

Query: 361 DDPttVAKTKNHLASCYLKOGKYQDAETLYKEILTRAKEKEFGSVNGDNKPIWMHAEEREE 420 

DOPNVAKTKNMLASC YLKQGK YQDAETL Y KEI LTRAHEKEFGS VNG+NKPI WMKAEEREE 
Sbjct: 360 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGENK PI WMKAEEREE 419 

Query: 421 5 KDKRRDSAP YGEYG5 WYKACKVDS PTVNTTLRS LGALYRRQGKLEAAHTLEDCASRNRK 480 

SKDKRRD P E YGS W YKAC K VDS PT V NTTLR * LGA L Y R +GKLEAAHTLEDCASR+ RK 
Sbjct: 420 S KD K RRORRPH -E YGStt Y KAC K VDS PT VNTTLRT LGAL YRP EGK L EAAH TLE DCAS R5 RK 478 

Query: 481 QGLD P ASQT K WE L LK DC S G RAG D RRS S RDMAGGAG PRS ES DLE D VG PT AEWHGDG SGS L 540 

QGLDPASQTKWELLKDGSGR G RR SRD+AG P+SESDLE* GP AEW+GDGSGSL 
Sbjct: 479 QGLDPASQTKWELLKDGSGR-GKRRGSRDVAG---PQSESDLEESGPAAEWSGDGSGSL 534 

Query: 541 RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPHPRHKRASSLHFLNKSVEEPTQPGG 598 

RRSG5FGKLRDALRR5SEMLV+KLQGG PQEP N RMKRASSLNFLNKSVEEP QPGG 
Sbjct: 535 RRSGSFGKLRDALRRSSEMLVRKLQGGGPQEP-NSRMKRASSLNFLKKSVEEPVQPGG 591 



Pedant information for DKFZphtes3_4h6, frame 3 



Report for DKFZphtes3_4h6.3 



(LENGTH) 622 

(HW) 68934.92 

tpl] 6.12 

(HOMOL) TREHBL: AF055666_1 gene: "Klc2"; product: "kinesin light chain 2"; Kus musculua 

kinesin light chain 2 (Klc2) tnRNA, complete cds. 0.0 

{ BLOCKS) 8L00927C Trehalase proteins 

{ BLOCKS] BL01160I Kinesin light chain repeat proteins 

(BLOCKS) BL01160H Kinesin light chain repeat proteins 

(BLOCKS) BL01160G Kinesin light chain repeat proteins 

(BLOCKS) BL01160F Kinesin light chain repeat proteins 

(BLOCKS) BL01160E Kinesin light chain repeat proteins 

(BLOCKS) BL01160D Kinesin light chain repeat proteins 

(BLOCKS) BLO1160C Kinesin light chain repeat proteins 

(BLOCKS) BL01160B Kinesin light chain repeat proteins 

(BLOCKS) BL01160A Kinesin light chain repeat proteins 

ISUPFAM] tetratricopeptide repeat homology le-07 

(PROSITE) RGD 1 

(PROSITE) MYRISTYL 8 

(PROSITE) KINESIN LIGHT 2 

(PROSITE) AMIDATION 2 

(PROSITE) CAMP_PH0SPHO_SITE 5 

[PROSITE) CK2 FHOSPHO_SITE U 

[PROSITE] TYR - PHOSPHO SITE 3 

[PROSITE] PKClPHOSPHO~SITE 7 

{PROSITE! ASN GLYCOSYLATION 2 

[PFAH] Kinesin light chain repeat 

[KW| All Alpha 

(KM) LOW'COMPLEXITY 12.54 * 

[KM] COILED COIL 4.96 » 
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SEQ MAMMVrPREEKLSQDEIVLGTKAVIOGLETLRCEHRALLAPLVAPEAGEAEPGSQERCIL 

SEG 

PRD ccccchhhhhhhhhhhhhchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccchhhhh 

COILS 

SEQ LRUS LEA I E LGLG EAQV I LALS SH LGA V E S EKQ KLRAQVRRL VQ EN QWL RE ELAGTQQKL 

SEG 

PRO hhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCC 

SEQ QRS EQA V AQLEEE KQH LL FMSOI RKLDEDAS PNEEKGDVPKDTLDDLFPNEDEQS PAPSP 

SEG 

PRD hhhhhhhhMhhhhhhhhhhhhhhhhcccccccccccceccccccccccccccccccccc 

CO I LS CCCCCCCCCCCCCCCCCCC 

SEQ GGGDVSGQHGGYEI P A RLRT LH N L V I Q Y AS QG R Y E V A V P LC KQALE DLEKTSGH DH P DV A 

SEG 

PRO cccccccccccccchhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhhhhccccccchh 

COILS 

SEQ TMLN I LALVYRDQNK Y KEAAHLLNDALA X REKTLGKDH P AVAATLNN LAVLYGKRGKYKE 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhcchhhhhhhhhhhWihhhhhhcccccchhhhhhhhhhhhhcccccchh 

COILS 

SEQ AE PLCKRALE I REKVLGK FHPDVAKQLSMLALLCQNQGKAEEVEYY YRRALE1 YATRLGP 

SEG 

PRD hhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhccc 

COILS 

SEQ DDPN V AKTKNNLASC YLKQGKYQOAETL YKE I LTRAHEKEFGS VNGDNKPI WMHAEEREE 

SEG xxxxx 

PRD c ccc cc ehhhhhh hh hhhccc hhhhhhhh ft hhhh h h hh h h hcccc cccc chh h h hhhh h h 

COILS 

SEQ S KDKRRDSAP YGE YC5WYKACKVDS PT VMTTLRSLGALYRROGKLEAAHTLEDCASRMRK 

SEG xxxxxxxx 

PRD hhhhhccccccccccccc««««ccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ QGL DPASQTKWELLKDGSGRRGDRRS SRDMAGGAG PRS ESDLEDVGPTAEWNGDGSGS L 

SEG xxxxxxxxxxxxxx xxxxx 

PRO hhccctvhhhhhhhhhccccccccccccccccccccccccccccccceeeecccccccccc 

COILS 

SEQ RRSGS FG1CLRDALRRSS EMLVKKLQGGT PQEPPNPRMXRAS S LNFLHKSVEE PTQPGGTG 

SEG xxxxxxxxxx xxxx 

PRD ccccccchhhhhhhhhhhhhhhhhhcccccccccchhhhhhhcccecccccccccccccc 

COILS 

SEQ LSDSRT LS SSSMDLSRRS SLVG 

SEG xxxxxxxxxxxxxxxxxxxx . . 



PRD 
COILS 



c c c cc c c ccc c chhh hhhcc cc 



Prosit* for DKFZpht«s3_4h6.3 



PS00O01 
PS00D01 
PS00004 
PS00Q04 
PS00004 
PSO0004 
PSOOO04 
PS00005 
PS00005 
PS00005 
PS0OOOS 
PS00OO5 
PSOO00S 
PS00005 
PS0O0O6 
PS00006 
PS00O06 
PS00006 
PS0O006 
PS0O006 
PS0O006 
PS0O006 



15X->155 
163->167 
232->236 
470->474 
507->511 
519->523 
521->52S 



449->453 
S87->S91 
42S->429 
S05->509 
554->55B 
S78->382 
616->620 



451->454 
499->502 
507->S10 
539->542 
615->618 



30->33 
90->93 



13->17 



asn glycosylation 
asn~glycosylation 
camp phospho site 

CANP'pHOSFHO^SITE 

camp phospho site 
camp _ phospho~site 
camp~ph0spho_site 
pkc phospho site 
pkc~phospho~site 
pkc~phospho site 
pkc~phospho _ site 
pkc phospho~site 
pkc~phospho~site 
pkTphosphxTsite 
ck2 phospho~site 
ck22phospho~site 
ck2_phospho _ site 
ck2 phospho~site 
ck2~phosfho"site 
ck2~phospho~site 
ck2~phospho~site 

CK2~PHOSPHO~SlTE 



PDOC00001 
PDOC00O01 
PDOC00004 
PDOC00OO4 
PDOC00004 
PDOC00004 
PDOC00O04 
PDCC00OO5 
PDOC00O05 
PDOC00O05 
PDOC00005 
PDOC00O0S 
PDOC00005 
PDOC00O05 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00O06 
PDOC00O06 
PDOC00006 
PDOCO0006 
PDOC00006 
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PS00006 
PS00OO6 
PS00006 
PS00007 
PSQ0007 
P500007 
PS00008 

psooooa 

PSOOOOB 
PSOOOOB 
PSOOOOB 
PS00008 

psooooa 

PS00008 
PS0O0O9 
PS00009 
PS00016 
PS0116O 
P50116O 



568->572 
589->S93 
610->614 
339->346 
339->347 
424->432 
71->77 
86->92 
182->1B8 
187->193 
402->408 
482->488 
598->604 
600->606 
292->296 
499->503 
S02->505 
223->265 
265->307 



CK2 PKOSPHO SITE 

CK2~PH0SPH0~SITE 

CK2~PH0SPH0~SITE 

TYR~PKOSPRO~SITE 

TYR~PK0SPHO~SITE 

TYR~PH0SPHO~SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI OAT I ON 

AH I DAT I OH 

RGD 

KINESIS LIGHT 
KINESItTLIGHT 



FDOC00006 
FDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
P0OC00008 
P DOC 0000 B 
PDOC00009 
POOC00009 
PDOC00O16 
PDOCOOB93 
PDOC00B93 



Pfam for DKFZphtes3_4h6 . 3 



HMN_NAME Kinesin light chain repeat 

KMM * RALEDREK 1 1GHDH PDVAtMLNNLALvCRNQNK Y« EveH Y YM* 

♦ALED+EKT +GHDH FDVATMLN+ LALV+ R+QHKY + E+ + ++N 
Outry 223 QALE D LE KT SGH DH P DV A THLN I L AL V Y R DQN K Y KEAAH L LM 264 

50.46 26S 306 1 42 dkf ipht«s3_4h6.3 strong similarity to Kinesin light chain 

Alignment to KMM consensus: 
QO«ry * RALEDREK tlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYH* 

AL +REKTLG DHP VA LNNLA++* ++KY+E+E + + 

Ckflpht«s3 265 DA LA I REKT LGKDH PAVAATLNNLAVLYGKRGKY KEAEPLCK 306 

Outry 348 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HKM consensus: 
HKM * RALEDREKt 1GHDH P0VA tMLWNLALvCRNQNK YeEveNYYW 

RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY* 
Query 307 RALEIREKVLGKFHPOVAKQLSNLALLCQNQGKAEEVEYYYR 34 B 

39.10 349 390 1 42 dkf tphtes3_4h6. 3 strong similarity to Kinesin light chain 

Alignment to HKM consensus: 
Query * RALEDREK tlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

RALE + LG D P«-VA+ NNLA + Q+KY+++E +Y+ 

Ckfzpht«3 34 9 RA LE I Y AT RLG POD F NV AKTKNN LASC Y LKQGK YQ DAETL Y K 390 
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DKFZphtes3_4ol9 



group: testes derived 

DKFZphtes3 4ol9 encodes a novel 1180 amino acid procein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome C family heme-binding site signature. 
No informative BLAST results; Mo predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-apecif ic 
genes. 



similarity to megakaryocyte stimulating factor and mucin 

complete cDNA, complete cds, EST hits (few) 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3767 bp 

Poly A stretch at pos. 3751, polyadenylation signal at pos. 3737 



1 GGCTAGGTTT ACCTTCAGGG GCAGCCCAGG GCACTCTTGC TGCATATTGC 
51 ATCGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 
101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGACCC TTCAAGGCAG 
151 AGCTGACCTG TCCGGTAATC AAGGCAATGC AGCCGGCCGC CTAGCTACAG 
201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 
251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 
301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 
351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGCCTGTGGT CGAGACCCAG 
401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 
451 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 
501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 
551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 
601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 
651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 
701 AAGGAGACCC AGTTCCCTTC CTGTGACAAT CTGCTCCTCT GCAGACCCCA 
751 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 
801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGGGGCTGGC CTTCCTGCCA 
851 CACCAGACGG TCACCATCAG ATTTCCCTCC CCAGTGAGTT TGGACGCAAA 
901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 
951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 
1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAACCCAGCG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
1251 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCAAA 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
H01 TCACGGCAAA CAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GTCCAAGACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC ATGATAACCA AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTCT CTGGCCTCTC 
1701 CAACAGTGGC AAATCTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
1751 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1B01 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1B51 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TTTAAGACCA 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
20S1 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GGACAGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT CCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2301 CCACAGCTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC AC AG AC AT AA CCACGTGCCT 
2401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
24S1 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2S01 GGCCTCAGCG CCCCACCCTC CGCCAAGCCA GA GO AC AG AC AGACCCAGCC 
2551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2601 CGGCAGCCTG TGACCTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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TREMBL : HSMUC2A_1 gene: "MUC2"; product: "mucin'*; Human mucin-2 gene, 
partial cds., N - 1, score - 204, P - 1.4e-12 

PIR:S48478 glucan 1, 4-elpha-glueoaidaae (EC 3.2.1.31 - yeast 
(Saccharoayces cerevisiae), N ■ 1, Score » 192, P - 9.6e-ll 



>TR£MBL:HSU70136 1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds. 
Length - 1,404 

HSPs: 

Score - 242 {36.3 bital. Expect - 9.6e-16, Sum P<2> - 9.6«-16 
Identities - 145/546 (26*), Positives - 198/546 (36l> 

Query: 282 KRVSARTNKARAPErrPLSRRYDQAVTRPSRAQTCCPVKAETPKAPFQIC-PGFMITKTLL 340 

KM T t U TP PS + P T AP P P TK+ 

SbjCt: 488 KKPA PTT F KE PAPTT P*KEPA PTTTKE PS PTT PKE PA PTTTKSAPTTTKPAPTTTKS A P 546 

Query: 341 QTYPVVSVTLPQ----TYPASTMTTTPPKTSPV-PKVTIIKTPAQKYPGPTVTKTAPHTC 395 

T ST + TP TTP K +P PK TP + f PT Tit 
SbjCt: 547 TTPKEPS PTTTKEPAPTTPKEPAPTT PKKPAPTTPKE PAPTT PRE- -PAPTTTKK 599 

Query: 396 PM PTMT KIQVHPTASRTGTP RQTC P AT I T AKN R PQVS LLA S I M K S L PQVC PG P AMAKT P P 455 

P PI K + PT TP+*T P T LA P +A T P 

Sbjct: 600 PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 653 

Query: 456 QMH P VTT P AKN P LQTC LS ATMS KT 5 S QRS P VG VT K PS PQT - RI* P AM I T - KT PAQLRS V AT 513 

+ TTP + P T A T + +P *P+P T ♦ PA T K A T 
Sbjct: 654 EEPTPTTP-EEPAPTTPKAAAPNTPKE PAPTT PKE FA PTTPKEPAPTTPKETAPTTFKGT 712 

Query: 514 I LKTLCLASPTVANVKAPPQVAVAAG TPNTSCSIHENPPKAKATVNVKQAAKW-KA 569 

TL +PT AP + *A T TS PK A K* A K 

SbjCt: 713 APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKPAPTTPKGTAPTTPKEPAPTTPKE 772 

Query: 570 S S P S Y LA EG K I RC LAQP H PGTG V PRAAAELPLEAEK I KTGT - -Q KQAKT CHAFKT S V A VE 627 

♦P+ L +P P T A EL K T T HAT +T+ 

SbjCt: 773 PAPTTPKGTA PTTLKEPAPTTPKKPAPKELAPTTTKG PTSTTSDK PAPTT PK - ETAPTTP 831 

Query: 628 HAG A PS WT K V AE EGDK P PH V Y V P V DMAVTL PRGQLAA P LT N AS SQRH P PC LSQR P LAA PL 687 

AP+ K + P P V+P + S PLSP L 

Sbjct: B32 KE PAPTT PK - - K PAPTTPET PP PTTSEVST PTTTKE PTT I HKS PDESTPELSAE PTPKAL 8B9 

Query: 688 TKASSQGHLPTELTKTPSLA-- HLDTCLSKMHSQTHLATGAVKVQSQAPLAT— CLTKTQ 743 

+ + *PT TKTP+ + T n LI ♦ ♦ AP T T T+ 

SbjCt: 890 ENSPKEPGVPT--TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 946 

Query: 744 S RGQ P I TD I TTC L I P AHQAA DL S - - S NTHSQVLLTG S K V S N • * KACQRLGGLS AP P- WAK 798 

+ TT + * 0+ T + KV+ + + P AK 

SbjCt: 947 KTT ES K I TATTTQVT STTTQDTT P FK I TT LKTTT LAPKVTTTKKTI TTT EIHKKPEETAK 1006 

Query: 7 99 PEDRQTQPQPHGHVPG KTTQGG PC P AA 825 

P+DR T ♦ HI' P + 

SbjCt: 1007 PKDRATNSKATTPKPQKPTKAPKKPTS 1033 

Score - 205 (30.8 bits), Expect - 3.1e-12, Sum P[2) - 3.1e-12 
Identities - 146/565 (25%), Positives - 209/565 (36*1 

Query: 281 TKRVSARTNKARAPETPLSRRYDQAVTRPSPAQTQGPVKAE--TPKAPFQICPGPMITKT 33B 

TK+ + K AP TP +ATP+ PK TP* P P + T 

SbjCt: 597 TKKPAPTAPKEPA PTT P K ETAPTTPKKLT PTT PEKLAPTT PEK PAPTT PEELA PTT 652 

Query: 339 LLQTYPWS VTLPQT YPASTHTTTPPKTS Pv- PKVT 1 1 KTPAQMY PG PTVTK-TAPKTCP 396 

+ P TP* TP ♦ +P PK TP * P PT K TAP T P 

SbjCt: 653 PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE--PAPTTPKETAP-TTP 709 

Query: 397 M PTHTKIQVH PTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 453 

PT K + PT ♦ P + +PT ♦ S ♦ KP GA T 

SbjCt: 710 KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTS D- - KPAPTT PKGTAPT -T 761 

Query: 454 P PQMH PVTTPAKNP LQTC LSATMSKTSSQRS P VGVTK PS POT RLPAMITKTPAQLRS VAT 513 

P + P TTP KPT T T + +P KP+P* P TK P S 
SbjCt: 762 PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP B18 

Query: 514 IIJtTI^LASPTVANVKAPPQVAVAAGTPHTSGSIHEIJPPKAKATVNV KQAAKWKA 569 

T + PT AP APT E PP ♦ V+ K+ + K+ 

Sbjct: 819 APTTPKETAPTTPKEPAPTTPKKPA— PTTP ETPP PTTSEVST PTTTKE PTT I HKS 872 

Query: 570 - - - S5PS YLAEGKI RCLAQPHPGTGV PRAAAEL PLEAEKI KTGTQKQAKTDMAFKTS VAV 626 
S+P AE ♦ L GVP + P + T T K T* +T + 
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2651 GGCCATTCCA CATGCAACGT TGAGTCCTGG 
2701 TGCCCAGCCA TCAATCCCCG GCCAGGCGGT 
2751 GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC 
2801 GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC 
2851 GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT 
2901 AGGAGATCCG CATCCTCGCA GTGATCACTA 
2951 TACCTGGCGC GTCGCAGGAT CCGGCTGTGG 
3001 CCAAGCTACT TGGCGCGGCT ACCGTGTGCG 
3051 GCAGAGCCAC CACGACCATC CAGTCTGCCT 
3101 CGGGACCAAG CCCGGCACTG CCAGATGCTC 
3151 GCTGGGCAGC CGGGCCGGGG TCATGTCTGA 
3201 GCAGAGCCAG GACAGTATCT GACCATCGCT 
3251 CACGCTTGCA GCGTCTGCCA CTCCCTGAGC 
3301 CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC 
3351 GACGCACACA GCCCACCCGT GTGGTGCAGG 
3401 GGCCCCGGGG CAGTCTCTTG GGCCTCCGCC 
3451 TCCCAGGCAG CCGCATCGCC AGGACAAAGC 
3501 CCTGGAGGGG CTTTAAGATC CGCCAGCAGA 
3551 GCGAAGATAG TTCAAGCCAC CTGGCGAGGC 
3601 GAAGAACACA GAGGCGCTCT TGGGACCAGC 
3651 CGCACATGCA TTGGCCTGGC ATCTAGGACC 
3701 CTTCGTGGGA GGCACTCATG GCTCTCTGGG 
3751 ACAGCCTAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



Ho Medline entry 



Peptide information for frame 2 



ORr from 134 bp to 3673 bp; peptide length; 1180 
Category: similarity to known protein 



1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 

51 PPQPQHEGLK SKEHLPQOPA EGKTASRRVP RLRAWESQA FKNILVDEMD 
101 MMHARAATLI QANWRGYWLR QKL1SQMMAA KAIQEAWRRF NKRHILHSS1C 

151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 

201 VLCRPQSSPL LQPPAAQGTP EPCVQCPHAA RVRGLAFLPH QTVTIRFPCP 

251 VSLDAKCQPC LLTRTIRSTC LVHIECDSVK TKRVSARTNK ARAPETPLSR 

301 RYDQAVTRPS RAQTQGPVKA ETPKAPFQIC PGPMIT1CTLL QTYPWSVTL 

351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTCPMPTM 

401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 

451 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 

501 ITKTPAQLRS VATILKTLCL A3PTVANVKA PPOVAVAAGT PKTSGSIHEN 

551 PPKAKATVNV FOAAKWKA3 SPSYLAEGKI RCLAOPHPGT GVPRAAAELP 

601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 

651 VDMAVTLPRG QLAAPLTHAS SQRMPPCLSQ RPLAAPLTKA SSQGHLPTEL 

701 TKTPSLAHLD TCLSKMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGOPIT 

751 DITTCLIPAM QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 

B01 ORQTQPQPHG HVPGKTTOCG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 

851 DMGATRAQPS HPGQAVPCQE OTGPADAGVV GCQSWNRAWE PARCAASWDT 

901 WRNKAWPPR RSGEPMVSKQ AAEEIR1LAV ITIQAGVRGY LAKRRI RLWH 

951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 

1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSO HRCFQSCQAH ACSVCHSLSS 

1051 RIGSPPSWM LVGSSPRTCH TCGRTQPTRV VQGHGQGTEG PGAVSWASAY 

1101 QLAAL5PR0P HRQDKAATAI QSAWRGFXIR QQMRQQQMAA KIVQATWRGH 
1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol9, frame 2 

TREMBL:KSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds., N - 2, Score - 
242, P - 9.6«-16 



GGAGACAACG 
GCCCTGCCAG 
AATCGTGGAA 
ACCTGGCGCA 
GGTGTCCATG 
TCCAGGCGGG 
CACCGGGGGG 
GCGCAACCTG 
GGCGCGGCTA 
CACCCCGTCA 
CCGAAGCTGG 
GCTTCCAGTC 
TCCAGGATCG 
TCGCACCTGT 
GCATGGGCCA 
TACCAGCTGG 
GGCCACAGCC 
TCAGGCAGCA 
CACCATACCC 
AGACCCCTCG 
CTGGCTCCCT 
TCTAATGAAT 



GAGCCACACG 
GAGGACACGG 
CCGCGCATGG 
ACAAGGCGGT 
CAGGCTGCAG 
CGTCCGTGGC 
CCATGGTCAT 
GCACACCTCT 
CAGCACCCGC 
CGTGGGTGGA 
TTCCAGGATG 
CTCCCAGGCA 
GGAGCCCGCC 
CATACCTCTG 
GGGCACTGAG 
CTGCCCTGAG 
ATCCAGTCCG 
GCAAATGGCA 
GGAGCTGTCT 
GCCAGCTCAC 
GCAGTGGCGA 
AAAGTCCTCC 



891 



WO 01/1 2659 



PCT/1BOO/01496 



SDJCt: 813 PDESTPELSAEPTPKALENSPKEPGVP--TTKTPAATKPEMTTTAKDKTTEROLRTTPET 930 

Query: 627 EKACAPSWTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAA 685 

A AP TK A +K ♦ +T Q+ ♦ T ++ L LA 

Sbjct: 931 TT A - A P KMT KETATTTEKT TESKI TATTTQVTSTTTQOTT P TKI TTLKTTTLAP 9B3 

Query: 686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS QAPLATCLT 740 

+T + + TE+ P +T K + AT K Q + P +T 

Sbjct: 981 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQK PTKAPKKPTSTKKP 103-7 

Query: 741 KTQS B-GQP I TDIT TCLIPAHQAADLSSHTHSQVLLTCSKVSNHACQRLGGLSAPP 795 

KTR+PTT T+P + Q + + N + S 

SbjCt: 1038 KTHPRVRJtPKTTPTPRWCTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVHPKSEDA 1097 

Query: 796 W - AK PE DRQTQPQ P HGH V PGKTTQGG PC P AAC E VQGML V P PNA PTGHS TC N 845 

A+ E + PH + P T P QG+ + + PM ♦ CN 

Sbjct: 1098 GGAEGET PHM LL RP H V FMP E VT P DMD YL PRV F N -QG 1 1 1 N PMLS DETN I CN 1147 

Score - 198 (29.7 bits), Expect - 2.3e-ll, Sura P(2) - 2.3e-U 
Identities « 142/513 <27»), Positives - 200/513 {38»( 



Query: 


204 


RPQS SPLLQPPAAQGTPEPCVQG PHAARVRGLAFLPHQTVT 1 RFPC PVSLDAKCQPC LLT 


263 




R + P +PP G + H V+ + +P L 




Sbjct: 


207 


RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 


266 


Query: 


264 


R--TIRSTCLVHIEGDSVKTKRVSARTNKARAP ETPLSRRYDQAVTRPSR AQTQ 


315 




T + T L + +V+TK + TKK + E S * 0+ + * S AT 




Sbjct: 


267 


NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGK EKTTSAKETQSIEKTSAKDLAPTS 


325 


Query; 


316 


GPVKAETPKAPrQICPGPHITKTLLQTY PWSVTLPQTYPASTHTTTPPKTSPVPKVTII 


375 




+ TPKA GP +T T + P T P+ PAST TP + + P + 




Sbjct: 


326 


KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE-PAST— TPKEPTPTTIKSAP 


375 


Query: 


376 


KTPAQMV PGPTVTKTAPHTC--PMPTMTKIQVH PTASRTGTPRQTC-PATITAKNRPQVS 


432 




TP + P PT TJC+AP T P PT TK + PT + F T PA T K+ P 




Sbjct: 


376 


TTPKE"PAPTTTKSAPTTPKEPAPTTTK-EPAPTrPKEPAPTTTKEPAPTTTKSAPTTP 


432 


Query: 


433 


LIAS I MKS LPQVC PGPAMAKT P PQKH P VTT P AKN PLQTC LS ATMS KT SSQRS PVGVT 


489 




+ K P PA TP ♦ P TTP K P T + T + +P 




Sbjct: 


433 


KEPAPTTPKKPAPTTPKEPAPT-TPKEPTP-TT P-KEPA PTTKEPAPT-TPKEPAPTAPK 


488 


Query: 


490 


KPS PQT - RLPAH IT- KTPAQLRS VA T I LK TLCLAS PTVANVKAPPQVAVAAGT 


540 




KP + P T + PA T K PA ♦ TK T ++PT AP AT 




Sbjct; 


489 


KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSA PTTTKEPAPTTTKSAPTT 


S48 


Query: 


541 


PMT-SGSIHENP PKAKATVHVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 


594 




P S + + P PK A K + A K +P+ E + P P P+ 




Sbjct: 


549 


PKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPA PTA--PK 


606 


Query: 


595 


AAAE LPLEAEKIKTGTQKQAKT DMA FKTS V AV EMAG A P S WTK - V A E EGDK P P H V Y V P V DM 


653 




A* P ++ T K+ F + AP+ + +A ♦ P P ♦ 




Sbjct: 


607 


EPA — PTTPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTPEEPTPTTPEEP 


664 


Query: 


654 


A VT L P RGQLAA P LTK AS SQRH P - PC LSQRP LAA P LT 1CA5 SQGH L PTELTKT P S LAH LOTC 


712 




A T P+ AAP T +PP+PAPT PET T 




Sbjct: 


665 


APTT PKA--AAPHT PFEPAPTTPKEP--APTTPKEPAPTTPKETAPTTPFGTAPTT 


716 


Query: 


713 


LSK 715 




Sbjct: 


717 


L + 

LKE 719 






- 108 


(16.2 bits>, Expect - 4.3e-02, Sura P(2] - 4.3e-Q2 





Identities - 60/214 (28%), Positives - 85/214 (391) 



Query; 


265 


TI R3TCLVH 1 EGOS VKT KRVSAR-TNKA- -RAPET P-LS RRYDQAVTRPSRAQTQG PVKA 


320 




T + +H D T + SA T KA +P+ P + A T+P T 




Sbjct : 


862 


TTKEPTTIHKSPDE-STPELSAEPTPKALENSPKEPGV PTTKTPAATKPEMTTTAKDKTT 


920 


Query: 


321 


ETP--KAPFQICPGPmTK-TLLQTYPWSVTLPQTYPASTMTTTPPKTSPVPKVTIIKT 


377 




E P P +TK T T ♦ T T TTT T+P K+T + KT 




Sbjct: 


921 


E RDL RTT P ETTT AA P KKTK ET ATTTEKTTES KIT ATTTQVTSTTTQ D-TTPF-KITTLKT 


978 


Query: 


378 


PAQHYPG PTVTK TAPHTC PMPTKT- KIQVHPTASRTGTPRQTC PATITAKNRPQVSL 


4 33 






+ P T TK T PTK+ TS+ TP* P A +P + 




Sbjct: 


979 


TT-LAPKVTTT KKTITTTEIMNKPEETAKPFDRATN5KATTPKPQKPTK — APKKPTSTK 


1035 


Query: 


434 


LAS I MKSL-- PQVCPGPA-MAKTP PQMH P VTT PAKNPLQT 470 






M + P+ P P H T P+ ++P + A+ LOT 




Sbjct: 


1036 


K PKTMPRVRKPKTTPTPRXMTSTMPELJJPTS RI AEAMLQT 1075 




Score 


- 56 


(8.4 bits), Expect - 3.1e-12, Sun P(2) - 3.1e-12 
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Identities - 17/60 (28*1, Positives - 22/60 ( 361) 

Query: 22 TVH E PV VTQW A VH P P A P A H P S LL DKH EK A P PQPQH EGLK S - K EH L PQQ PA EGKT A S RR V P 80 

T EP T P P PS E AP P+ ♦ K+ P P E + + P 

Sbjct: 533 TTK E PA PTTT K S A PTT P K E PS PTTTKEP A PTT PKE P APTT P KKPAPTT PKE PA PTT P K E P 592 

Score - S2 17.8 bits), Expect - 9.6«-16, Sum P(2> - 9.6e-16 
Identities - 17/59 (28%), Positives - 22/59 (37%) 

Query: 22 TVHEPV-VTQHAVHPPAPAHPSLLOKMEKAPPQPQHEGLKSKEHLPQQPAE-CKTASRR 78 

T EP T P P P+ £ P P+ +KE P P E TA + + 

Sbjct: 431 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKE PAPTTPKEPA PTAPKK 189 

Score - 51 (7.7 bits). Expect - I.2e-15, Sua P(2) - 1.2e-15 
Identities - 15/51 129%), Positives - 19/51 (37%) 

Query: 22 TVHEPvVTQWAVHPPAFAilPSLLDKNEKAPPQPQHEGLlCS-KEHLPQQPAE 71 

T EP T P P P+ + AP P+ + KE P P E 

Sbjct: 116 TT K £ P A PTTT KS A PTTPKE P APTT P K KP APTT P KE P A PTT P KE PT PTT PK E 466 

Score - 47 1 7 . 1 bits), Expect - 3.2«-15, Sum P(2) - 3.2e-15 
Identities - 12/41 (29%), Positives - 17/41 (41%) 

Query: 36 PAP AH PS LLD KM KA PPQPQH EGLKS K E H L PQQ P AEG KT A 5 76 

P P P ♦ P *? +KS P++PA T S 

Sbjct: 350 PTPTTPK— EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score - 47 (7.1 bits). Expect - 3.2e-15, Sum P(2) - 3.2e-15 
Identities - 15/57 (26%), Positives - 19/57 (33%) 

Query: 22 TVHEPWTQHAVHPPAPAKPSLLDKHCKAPPOPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKSAPTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 433 

Score - 46 (6.9 bits). Expect - 4.0e-15, Sum P(2) - 4.0e-15 
Identities - 16/5B (27%), Positives - 22/58 <37%J 

Query: 20 LATVHEPVVT QWAVHPFAPAHPSLLDKHEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P F* ♦ P +P KS P*+PA T 

Sbjct: 344 LTTPKEPTPTTPKEPA3TT PKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score - 42 (6.3 bits), Expect - 1.0e-14, Sum P(2) - 1.0e-14 
Identities - 15/60 (25%), Positives - 21/60 (35%) 

Query: 22 TVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+ + AP P+ + KE P E ♦ P 

Sbjct: 463 TFKEFAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score - 39 (5.9 bits), Expect • 2.1e-14, Sum P (21 - 2.1e-14 
Identities - 15/55 (27%), Positives - 20/55 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

T EP T P PA ♦ * P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKE PA PTTT REPS PTT P K E P A PTTT KS A PTTT KE P APTTTKS 544 



Pedant information lor DKFZphtes3_4ol9, frame 2 



Report for DKrZphtes3_4ol9.2 



[LENGTH] 

tMW) 

[pi) 

[HOMOL] 

1 FUNCAT ) 

[FUMCAT) 

t FUNCAT ) 

[FUMCAT J 

(BLOCKS) 

[PROSITE1 

(PROSITEJ 

[PROSITE) 

[PROSITE] 

[PROSITE) 

(PROSITE) 

IKWJ 

(KM] 



127693.40 
10.25 

SWISSPR0T:KUC2 HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). le-08 



I classification not yet clear-cut [S. 
30.01 organization of cell wall [S. 
30.90 extracellular/secretion proteins 
01.05.01 carbohydrate utilization [S. 
DL00412B Neuromodulin (GAP-43) proteins 
CYTOCHROME C 1 
MYRISTYL ~ 12 
CAMP PHOSPHO SITE 1 
CK2 PHOSPHO SITE 8 
PKC"pHOSPHO~SITE 25 
ASN~G LYCOS YLATION 2 
Alpha Beta 

LOW COMPLEXITY 5.00 % 



cerevisiae, YJRlSlc] 6e-06 
cerevisiae, YIR019c] 6e-06 

(S. cerevisiae, YIR019c) 6e-06 
cerevisiae, YIR019c) 6e-06 
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SEQ HT LQG RADLS GNQGN AAG RLAT V H E P VVTQW A VH P PAP AH P S L L D KMEK A P PQ PQH EG L K 

SEC 

PRO cccecce«»ccccccc«ee*e«««c««e«i«««ccccccccee«ecccceceecceccce 

SEQ SKEHLPQQPAEGKTASRRVPRLRAVVESQAFKNILVDEMDMMHARAATLIQANWRGYWLR 



PRO cccccccccccccccccchhhhhhhhhhhhhhh«*«hhhhhhhhtihhhhhhhhccchhhh 

SEO QKLISQHMAAKAIQEAWRRFT1KRHILHSSKSLVKKTRAEEGDIPYHAPQQVRFQHPEENR 

SEC 

PRO hhhhhhhhhhhhhhhhhhhhhhhe***eccchhhhhhhhcccccccccceeeecccccce 

SEQ LLSPPIMVNKETOFPSCDNLVLCRPQSSPLLQPPAAQCTPEPCVQGPKAARVRGLAFLPH 

SEG 

PRD ••ccc«ee«cccccccccc««t«cccccccccccccccccccccccccctt«««««eccc 

SEQ QT VT I R FPC P V S L DAKCQ PC L LT RTI RSTC L V H I ECD5 VKTKR VS ARTN KARA PET P LS R 

SEG 

PRD eeee«acccccccccccccccccccccceeeeecccccccc««oeecccccccccccccc 

SEQ RYDQAVTRPSRAQTQGPVKAETPKAPFQICPGPMITKTLLQTYPWSVTLPQTYPASTMT 

SEG xxxx 

PRD ccceee eecccccccccee *cccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTIIKTPAOMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRQTCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccc««eeccccccccccccccccceeccccccc««*ccccccccccccccc 

SEQ ATI TAKNRPQVS LLA5 INKS LPQVCPG PANAKTPPQKH P VTT PAKN PLQTC LS ATMS KTS 

SEG 

PRO cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQRS PVGVTKPS PQTRLPAM I TKTPAOLRSVATI LKTLCLAS PTVAMVKAPPQVfcVAAGT 

SEG 

PRO CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ PNTSGS I HEM P PKAKAT VN V KQAAK VVKAS 3 PS Y LAEGK I RC LAQPHPCTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD cc cccc c eee ccc c c cc ccccc cccc cc cc cccc cccc c c c c cccc c ccccc c c c cccc e 

S EQ LEAEK I KTGTQKQAKTDMAFKTS VAVEKAGAPSMTKVAEEGDK P PHVYVPVDMA VTLPRG 

SEG xxxx 

PRD cccccccccccccccccccccceceececcceccceeeeccccccceMCcccccccccc 

SEQ QLAA P LTN AS SQRH P PC LSQRPLAA P LT KA5 SQGH LPT ELTKT PS LAHLDTC LS KMH SQT 

SEG 

FRO ccccccgccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ H LATG A VK VQSQA P LATC LTKTQS RGQ P I T DI TTC L I P AHQAA DLS S NTH SQVLLTGS KV 

SEG 

PRO ccccce«oeeccccceeeecccccccccccccccccccccccccccccccee«««ccccc 

SEO S N HAC QRLGG LS AF PW AKP EDRQTQPQ PHGHV PG KTTQGG PC P AAC EVQGKL V PPMAPTG 

SEG 

PRD ccceccccccccccccccccccccccccccceeccccccccccccccccccccccccccc 

SEQ HSTCNVESWG DNGAT RAQPSMPGQAV PCQE DTG P AD AG WGGQ5WN RA WE F ARGAAS HOT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ WRNKAWPPRRSGEPHVSHQAAEE I RI LAVIT IQAGVRGYLARRRI RLWHRGAMV I QATW 

SEG 

PRD ccc«««cccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ RG Y RVRRNLAH LC RATTT I QS AWRG YSTRRDQARHWQMLHP VTWV E LGS RAGVMS DRSW F 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhhhhhhhh 

SEQ QDG RARTVS 0 H RC FOSCQAHAC SVCHSLSSRIGSPPS WML VG SS P RTC HTCGRTQ PT R V 

SEG 

PRD hcccee««ccceeeecccc«e«e*eeecccccccccceeeee«ccccccccccccccc«e 

SEQ VQGHGQGT EG PGA VS WASA YQLAA LS P RQP HRGD KAAT A I QS AWRG FX I RQQKRQQQHAA 

SEG xxxxxxxxxxxxx 

PRO B««ccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIVQATWRGHHTRSCLXNTEALLGPAOPSASSRHMHWPGI 

SEG XX 

PRO hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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Prosit* foe DRFSphtes3_4ol9.2 



PS00001 


542->546 


ASH 


GLYCOSYLATION 


PDOC00001 


PS00001 


S6B->672 


ash' 


"CLYCOSYLATION 


PDOC000O1 


PS00004 


2B2->286 


CAMP PHOSPHO SITE 


PDOC000O4 


PS00OO5 


76->79 


PRC 


PHOSPHO SITE 


PDOCOOOOS 


PS00005 


148->IS1 


PRC* 


"PHOSPHO~SITE 


PDOC00005 


PS00005 


244->247 


PRC' 


"phosphorite 


PDOC00005 


PS00005 


265->268 


PRC* 


"PH0SPH0~SITE 


PDOCOOOOS 


PS00005 


278->261 


PRC" 


"PHOSPHO~SITE 


PDOC00005 


PS00005 


2B1->2B4 


PRC' 


"pHOSPHO~SITE 


PDOCOOOOS 


PSO0O0S 


285->2BB 


PRC" 


"PHOSPHO SITE 


PDOCOOOOS 


PSOOOOS 


288->291 


PRC" 


"PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


299->302 


PRC" 


"PHOSPHO SITE 


PDOC00003 


PS00O05 


322->325 


PRC - 


"PHOSPHO _ SITE 


PDOCOOOOS 


PSOOOOS 


414->417 


PRC" 


"PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


424->427 


PRC" 


" PHOSPHO SITE 


POOC0000S 


PSO00O5 


4Bl->484 


PRC" 


"PHOSPHO~SITE 


PDOC0O005 


PSOOOOS 


610->«13 


PRC" 


"phosphorite 


POOC0000S 


PS000O5 


671->S74 


PRC" 


"PH0SPH0~SITE 


POOC0000S 


PS00005 


679->682 


PRC 


PHOSPHO SITE 


PDOCOOOOS 


PSOOOOS 


900->903 


PRC" 


'PHOSPKO~SITE 


PDOCOOOOS 


FSO00OS 


959->962 


PRC" 


"pHOSPHO~SITE 


PDOC00005 


PS00005 


987->990 


PRC" 


"PH0SPH0~SITE 


PDOCOOOOS 


PSOOOOS 


101S->101B 


PRC 


*PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


1049->lO52 


PRC" 


"PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


1065-U06B 


PRC" 


"pHOSPHO _ SITE 


PDOCOOOOS 


PSOOOOS 


110S->1109 


PRC" 


PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


1146->1H9 


PRC" 


>HOSPHO~SITE 


PDOCOOOOS 


PS00005 


U71->1174 


PRC 


*PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


22->26 


CK2" 


'PHOSPHO SITE 


PDOC00006 


PSOOOOS 


42->46 


CK2" 


"PH0SPH0~SITE 


PDOCOOOOS 


PS 000 06 


156->160 


CK2" 


'PH0SPH0~SITE 


pDccooooe 


PSOOOOS 


546->55a 


CK2" 


PHOSPHO~SITE 


PDOCOOOOS 


PS00006 


84B->e52 


CK2" 


"PHOSPHO~SITE 


PDOC00006 


PSOOOOS 


9B8->992 


CK2" 


>HOSPHO~SITE 


PDOCOOOOS 


PS000O6 


1003->1007 


CR2" 


"PHOSPHORS I TE 


PDOCOOOOS 


P500006 


1027->1031 


CR2" 


>HOSPHO~SITE 


PDOCOOOOS 


PS00008 


n-m 


MYRI STYL 


pDocooooe 


P500008 


14->20 


MYRISTYL 


PDOCOOOOS 


PS00008 


539->54S 


HYRISTYL 


PDOCOOOOS 


PSQ0QO8 


591->597 


MYRISTYL 


PDOCOOOOS 


PSOOOOS 


746->752 


HYRISTYL 


PDOCOOOOS 


psooooe 


777->7B3 


MYRISTYL 


PDOCOOOOS 


psooooa 


B53->859 


MYRISTYL 


PDOCOOOOS 


PS0000B 


B78->884 


MYRISTYL 


PDOCOOOOS 


psooooe 


882->888 


MYRISTYL 


PDOCOOOOS 


psooooa 


1008->1014 


HYRISTYL 


PDOC0000B 


psooooe 


1053->1Q59 


MYRISTYL 


PDOC00008 


psooooa 


10B3->1089 


MYRISTYL 


PDOCOOOOS 


PS00190 


1042->1048 


CYTOCHROME C 


PDOC0O169 



(No Pf*m data available for DKFZpht«s3_4ol9.2> 
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DKFZphtea3_50j4 



group: testes derived 

DKrzphtes3 50j4 encodes a novel 1B7 amino acid protein proline rich protein. 

No informative BLAST results; Mo predictive prosit*, pfam or SCOP motif «. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown, prolin ritch protein 

complete cDMA, complete cds, EST hits 

Sequenced by DKTZ 

locus: unknown 

Insert length; 1186 bp 

Poly A stretch at pos . 1176, polyadenylation signal at pos. 1126 



1 CACTGGGCCT CTGAAGCTCA GAGCTCACCC CTCAGATCGG CTCTCCTAGG 
SI CCTCCTGGGA TGAGCGAGCC ACCACGACCC AGTCCTGTGA TGCCTGCTCT 
101 TCCCTCTACC AGCACCTGCC CGCCCAGAGA CCAGCGCACC CCTGAAGTCC 
1S1 AGCCCACCCC TGCAAACGAC ACATGGAACG GCAAGCGGCC TCGATCCCAG 
201 CAGGAGAACC CAGAGAGCCA GCCTCAGAAG AGCCCACCCC CCTCAGCCAA 
251 GCCCTCCCTC GTAGCTGAGG TCAAGGGCAG CGTCTCGGCC AGCCAACAGG 
301 GCACCTTGAA TCCCACGGCT CAAGACCCCT TCCAGCTCTC CGCTCCTGGC 
351 GTCTCCTTGA AGGAGGCTGC AAATGTTGTG GTCAAGTGCC TCACCCCTTT 
401 CTACAACGAG GGCAAGTTTG CTTCCAAGGA GTTGTTTAAA GGCTTTGCCC 
451 GCCACCTCTC ACACTTGCTG ACTCAGAAGA CCTCTCCTGG AAGGAGCGTG 
501 AAAGAAGAGG CCCAGAACCT CATCAGGCAC TTCTTCCATG GCCGGGCCCG 
5S1 GTGCGAGAGC GAAGCTCACT GGCATGGCCT GTGTGGCCCC CAGAGATGAC 
601 CAACTGCTGG CTGGGCAGGG CCCGCGTCCT CCCCCAGATT CTAGCATGGG 
651 TCATCCTGGG CCTCACCTGC TGATGCCAGG GCCATCGTCT TTTCTCAGTC 
701 CTTCTCCTTT CCAACCATAC TTGGCTTTGG GGATGACCCC AGACACCCCC 
751 TGAATCCAGG TCAGAGGTCA GCCCACCTTT CTTTCTGCTT GCAAAGCCTA 
801 TAGACCCTTC TCAGAGCGGT CCTCATGGCT GGGTTTTCTG GGACACATGT 
851 CGAGGACAGA AGGTGGAGGG TGGTGGAGCT GCTGCTGGAA GAAGGGGAAG 
901 GAAGAGTGGC CCCTCCCCGA GTTCTAAGTC AGGATGAGGC CCACCTGTCC 
951 AAGGTATCGG AACCTACCCA GGGGACCCTC AGATCCTCCA CCCACTCCCC 
1001 CATCCATTAC GATGCCAGCT TCCAGCCTTG CCCAGCTCAG AGCTGTGCCA 
1051 GAGGAGAGGC AGCCAGGCCC TGTTCCTGCT CAGCTCCTGC TCAGGAAGGC 
1101 CAGGCCTGAC AGATCTTTGG GAGAGGAATA AAGTTGTGTT GTTGTGGGGC 
1151 ATGCAGGCGT GCACACAGCC CTTTTCAAAA AAAAAA 



BLAST Results 



Ho BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF froa 36 bp to 596 bp; peptide length: 187 
Category: putative protein 



1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVOP TPAKDTWKGK 

51 RFRSQQENPE SQPQKRPRPS AKFSWAEVK G5VSASE0GT LNPTAQDPFQ 

101 LSAPGVSLKE AANVWKCLT P FY KECK FAS KELFKGFARH LSHLLTQKTS 

151 PCRSVKEEAO NLIRHFFHGR ARCESEADWH GLCGPQR 

BLAST P hits 



Entry MHU92455_l from database TREMBL: 



897 



WO 01/12659 



PCT/IB00/01496 



product: "WW domain binding protein 7"; Kus musculus WW domain binding 
protein 7 raRNA, partial cds. 

Score - 134, P - 6.9e-08, identities - 45/125, positives - 56/123 



Alert BLASTP hit! tor DKFZphtes3_S0J4, frame 3 
No Alert BLASTP hits found 

Pedant in to coat ion for DKFZphtea3_50}4, frame 3 



Report for DKF2phtea3_50}4 . 3 



[LENGTH) 187 

[MW] 20353.06 

[pi] 9.76 

[PROSITEI MYRISTYL 1 

( PROS I TE | AMI DAT ION 1 

[PROSITEJ CK2 PHOSPHO SITE 

[PROSITEJ PKC~PHOSPHO SITE 

[KM] All~Alpha 

[KW] LOW~COHPLEXITY 

SEQ 
SEG 
PRD 

SEQ S QPQK R P RP S AK P S W AEV KGS VS AS EOCT LM PT AQDP FQL 5 APGV S LK EAANWV KCLT 

SEG 

PRO cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 

S EQ PFTf KEGKFAS KEtFKGFARHLSHLLTQKTS PGRS VKEEAQWLI RH FFHGRARCES EADWH 

SEG 

PRO cccccccchhhhhhhnhhhhhhhhheeecccccchhhhhhhhhhhhhhcccnhhhhnhhh 

SEQ GLCGPQR 

SEG 

PRO ccccccc 



Proaite for DKFZphtes3_50 j 4 . 3 



PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PS00006 
PS00006 
PS00006 

psooooe 

PSOOQ06 
PSOOOOS 
PS00009 



3->6 
46->49 
70->73 
107->110 
146->149 



154- 



1S7 



54->58 
84->B8 
94->98 
107->111 
154->158 
175->179 
81->B7 
4B->52 



PKC PHOSPHO 

PKC~PHOSPHO~ 

PKC~PHOSPHO~ 

PKC^PHOSPHO' 

PKC~PHOSPHO" 

PKC~PHOSPHO' 

(^"PHOSPHO' 

CK2~PHOSPHO" 

CK2~PHOSPHO~ 

CK2~PHOSPHO" 

C1C2~PH0SPH0" 

CK2~PHOSPHO" 

MYRISTYL 

AH I DAT I ON 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
'SITE 
SITE 
SITE 
SITE 
SITE 



POOC00O0S 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S ■ 

poocooooe 

POOC00006 

poocooooe 
poocooooe 

PDOC00006 

poocooooe 

PDOC00008 
PO0C00009 



(No Pfan data available for DKrZphtes3_5Q j 4 . 3) 
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DKFZphtes3_5On06 
group: testes derived 

DKFZphtes3_50n06 encodes a novel 166 amino acid protein without similarity to known protein; 

Bo informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cda, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1095 bp 

Poly A stretch at pos. 1085, polyadenylation signal at pos . 1061 

1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCCGAAGGT 

51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 

101 GACCTGCTCT GCTCACATGC CCCCCTGTCC ACCGAGGACG ACACCTCCCC 

151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 

201 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCCCTCCTG 

251 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 

301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 

351 GGGACAGGAC CAGGAGGAGC TACTACCTCA ATGAGATCCA GAGCTTCGCG 

401 CGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 

*5l CCGCCCCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 

501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 

551 CTGGACGGCT CCGTGGACGA GAGCAAGCTG CGCGAGCTGA CGCAGCGCTA 

601 CCTCGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 

651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 

701 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC ACCAGCCCGG CCGCGCTGCG 

751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTCC 

601 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAACCCCCTC 

851 TTCGCCTGCT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 

901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCCCGTCCC CCCGGAGGGG 

951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 

1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGCAGTGCT 

1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Kedline entry 



Peptide information for frame 2 



0RF from 302 bp to B59 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 



1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR WGEIAFQLD 

51 RBILAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 

101 LALSARLEKL GY5R0VHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 

151 KLVIOWPPK FLGDSLLLLH CLCELSKEDG KPLFAW 

BLAST? hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphte»3_50n06, frame 2 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_SOn06, frame 2 

Report for DKrZphtes3_50n06.2 

(LENGTH] 186 
(MWJ 21049.39 
(pi} 9.28 
[KH] All Alpha 

[KW| LOW~COHPLEXITY 5.38 I 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLHEIQsrAGAEKDARVVGEIAFQLDRRILAYVFPG 

SEG 

PRD ecccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhnhhhhhhhcccc 

SEQ VTRLYGFTVAN IPEKI EOT ST K SL DG S VDERKL RE LTQR YLALS A RL EK LG YS RDVH PAF 

SEG 

PRO c e ee e • « t «e • cccccccc c c c ccccccftti h hh hhh hhhh hhhhhh hhh h c c c c cc c c c h 

SEQ SEFLIHTYGILKQRPDLRANPLHSSPAALRKLVIDWPPKFLGDSLLLLNCLCELSKEDG 

SEG xxxxxxxxxx 

PRO hhhhhhccee • cccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhg ccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 



(Ho Prosit* data available for DKrzphtet3_50n06. 2) 
{No Pfam data available for DKFZphtes3_50n06.2) 
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group: testes derived 

DKFZphtea3_50n23 encodes * novel 499 amino acid protein without similarity to known proteins. 

Ho informative BLAST results; No predictive prosite, pfm or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes. 



unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKF2 

Locus: unknown 

Insert length: 1907 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pas. 1672 



1 GGGCACCAGC CACTTTCCAC CATGACTGTG CCCTCGAGCG TCGCAGATGT 
51 GTTCGGCAGC AAGGACACTG AGAGCCTTCA GCCTGTGCTT TTACCCTTAG 
101 TAGATCGCAG GTTTCCTAAG AAATGGGAAA GACCGGTGGC AGAAAGCTTA 
151 GGCCACAAAG ACAAAGACCA GGAGGACTAC TTCCAGAAGG GAGGACTCCA 
201 AATTAAGTTC CACTGTAGCA AGCAGCTGTC TCTAGAGAGC TCCAGGCAGG 
251 TGACCTCTGA GAGCCAAGAG GAGCCCTGGG AGGAGCAATT CGGCCGGGAG 
301 ATGCGGAGGC AGCTGTGGCT GGAGGAGGAG GAGATCTGGC AGCAGCGGCA 
351 GAAGAAGTGG GCCCTGCTGG AGCAGGAGCA TCAGGACAAG CTGCGGCAGT 
401 GGAATCTGGA AGACCTGGCC AGGGAGCAAC AGCGGAGATG GGTCCAGCTA 
451 GAAAAGGAGC AGGAGAGCCC ACGGAGACAG CCAGAGCAGC TAGGGGAGGA 
501 TCTGGAGAGG AGGATCTTCA CACCCACCAG TCGATGGAGG GACTTGGAGA 
551 AGGCAGAGCT ATCATTAGTG CCTGCCCCAA GCCGGACCCA ATCTGCTCAC 
601 CAAAGCAGGA GGCCACACTT GCCCATGTCT CCTAGTACCC ACCAGCCTGC 
€51 CCTGGGAAAC CAGACACCTA TGAGTTCAGT GGAGTTTACC TACAGACCAC 
701 GCACCCGCCG AGTTCCCACA AAGCCCAAGA AATCTGCCTC CTTTCCTGTC 
751 ACTGGGACAT CCATCCGAAG GCTGACCTGG CCCTCTTTGC AGATATCCCC 
801 TGCAAATATT AAGAAGAAGG TGTACCACAT GGACATGGAG GCCCAGAGGA 
851 AGAACCTGCA GCTCCTGAGT GAGGAGTCTG AGTTGAGGCT GCCCCACTAC 
901 CTGCGCAGCA AAGCACTGGA GCTCACCACC ACCACCATGG AGCTGGGCGC 
951 GCTCAGGCTG CAGTACCTCT GCCATAAGTA CATCTTCTAT AGACGCCTCC 
1001 AGAGCCTCCG GCAAGAAGCG ATCAACCATG TACAAATCAT CAAAGAAACG 
10S1 GAGGCTTCCT ACAACGCCCA GAACCTCTAC ATCTTCCTGG AAAACATTGA 
1101 CCGCCTGCAG AGTCTCAGGC TGCAGGCCTG GACGGACAAG CAGAAGGGGC 
1151 TGGAGGAGAA GCACCGAGAG TGCCTGAGCA GCATGGTGAC CATGTTCCCC 
1201 AAGCTCCAGC TGGAGTGGAA CGTTCACCTG AACATCCCTG AGGTCACCTC 
1251 GCCAAAGCCA AAGAAATGCA ACTTGCCTGC AGCCTCACCC CGCCACATCC 
1301 GCCCCAGTGG CCCCACCTAC AAGCAGCCCT TTCTGTCTAG GCACCGGGCA 
1351 TGTGTGCCCC TGCAGATGGC CCGCCAACAG GGGAAGCAGA TGGAGGCTGT 
1401 CTGGAAGACC GAGGTGGCCT CCTCCAGTTA CGCAATAGAA AAAAAGACCC 
1451 CTGCCAGCCT TCCCCGGGAC CAGCTGAGGG GACACCCAGA TATTCCCCGG 
1501 CTGTTCACAC TGGACGTGTA GTCCTCCTGC CACAAAAGCC TGAACTTCCT 
1551 GAAGGCCCAG TAAGCGCCTC AGCGAACCAA AGGAAGGAAT GCCAGGAACC 
1601 TACAAATGAA TCCGCTTAGC TTGTTCAAAA AAAGTCAAGC GAGTCACTCC 
1651 CTGGAACCCA AATAAGCCAG AAGGATCAAG ACAGCCCCAG TCTCCACTGC 
1701 ATCCCTCAGC CAGTGATTCT CAACCTTCTG AGCCACGGAA ACCCACAGAG 
1751 AACTTGGTCA AAATCCAGGT TCCCAGCTCG TGCTTTTAAA GAAACCCTCT 
1801 GGGGGTTGCT GAGTACTCCT AGAACTTTGA GAAACACTGC TTCCCTCCTG 
1851 CAGTCCCCAA ACTCTACATT TTAATAAAAT AGAGGTTGGT TTATTTTAAA 
1901 AAAAAAA 



BLAST Results 



Ko BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 22 bp CO 1318 bp; peptide length: 499 
Category: similarity to known protein 
Classification: no Clue 



1 MTVRSRVADV FGSKDTESLE 
51 EDYFQKCCLQ IKFHCSKQLS 
101 EEEEHWQQRQ KKWALLEQEH 
151 RREPEQLGED VERB I FT PT 5 
201 PNSPSTGQPA LGKQRPKSSV 
251 LTWPSLQISP ANIKKKVYHH 
301 LTTTTMELGA LRLQYLCHKY 
351 NLYIFLENID RLQSLRLQAW 
401 VHLNIPEVTS PKPKKCKLPA 
4 51 RQQGKQHEAV WKTEVASSSY 



Ho BLASTP hit a available 

Alert BLASTP hits for DKFZphte»3_50n23, frame 1 

PIR:S2B589 trichohyalin - rabbit, N - 1, Score - 134, P - 5.3e-Q5 

TREMBLNEW: AF132479 1 product; "E3*2L protein"; Kus mus cuius Ese2L 

protein mRNA, complete cds . , N - 1, Score - 130, P - 0.00017 



>PIR:S2B589 trichohyalin - rabbit 
Length - 1,407 

HSP9 : 



Score - 134 (20.1 bits), Expect - 5.3«-0S, P - 5.3e-05 
Identities - 86/354 (241), Positives - 154/354 (431) 



Query: 


29 


RRF P K KWERP VAES LCH K DK DQE D Y FQKGG LQI K- FH C5 KQ LS LESSRQVTS E5 QEEPWE 


87 




R*+ K +R + L * ++E + + G + F *QL +++ E *EE + 




Sbjct: 


165 


RQ Y R D KEQRLQRQE LEE RRAEEEQLRRRKG R DAE E F I E EEQ LRRREQQ E LK RELREE EQQ 


224 


Query: 


as 


E E FG R EMRRQLWLEE EEMWQQRQK KW A L L EQ EH QEKL RQWN LE DLAREQQRRWVQ LE K EQ 


147 




RE + L+EEE RQ++W E Q++LR+ LE++ RE+**R Q E+ + 




Sbjct: 


225 


RRERREQHERA-LOEEEEQLLRQRRURE - EPREQQQLRR-ELEE I -REREQRLEQEERRE 


280 


Query: 


148 


ESPRREPEQLGEDVERRIFTPTSRWROLEKAELSLVPAPSRTQSAHOSRRPHLPMSPSTQ 


207 






+ RRE ++L E ERR +++EL ROQR + + 




Sbjct: 


281 


QQL RRE- QRL - EQEE RR EQQLRR ELE E I RE REQRLEQEERREQRLEQ E ERRCOQLKRELE 


338 


Query: 


208 


QPALGKQRPNSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 


266 




+ +QR + E RR+++++A GS+RW SA++K 




Sbjct : 


339 


EIREREQR LEQEER- REQL LAE £V REQAR — E RGES LT R- RWQRQLES EAGARQS K 


390 


Query: 


267 


V Y HH OHEAQRKNLQL LS EE S EL RL P H YLRS KAL ELTTTTM ELGALRLQY LCHKY 


320 






VY +R* Q L ++ E R R ♦ LE E R Q L + 




Sbjct: 


391 


VYS R P RRQEEQS LRQDQE RR -QRQ E R E RE L EEQA RRQQQWQ A E EES ERRRQRLS ARP 


446 


Query: 


321 


I FY RRLQS LRQEA I NHVQI MKETEAS YKAQNL YI -FLENI DRLQ5 L- R LQAWT DKQKG LE 


378 






RQ +E Q +E E ++ + FLE ++LQ R Q ♦+ E 




Sbjct: 


447 


SLRE R - QLRA EERQEQEQRFREEEEQ RRERRQELQFL EEEEQLQRRERAQQLQE E OS FQE 


505 


Query: 


379 


EKHR 382 
+ + R 




Sbjct: 


506 


DRER 509 




Score 


- 119 


(17.9 bits), Expect - 2.2e-03. P - 2.2e-03 





Identities - 79/357 (221), Positives - 150/357 (421) 



Query: 


33 


KKWERPVAESLGHKDKDQEOYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 


92 




++ E+ + + K +++E Q+ + + +Q R+ + + ♦ EE+F + 




Sbjct: 


990 


RREEQELRQERDRKFREEEQLLQE- - - REEE RLRRQERD RK FREEERQL RRQE L EEQ FRQ 


1046 


Query: 


93 


EKRRQLH LEEEEKMQQRQKKM ALLEQEKQEK LRQWN LED LAREQQRRWVQL EK EQES PRR 


152 




E R+ LEE+ *■ Q++++K L QE K R* E+ R +Q R QL *£** R 




Sbjct: 


1047 


ERORKFRLEEQ- 1 RQEK £ EK - QLRRQERD RX FRE EEQQRRR QEREQQLRRERO RK FR 


1101 


Query: 


153 


EPEQLGEDVERR1 FT PTSRWRDLEKAELSLVPA PSRTQS AHQSR- -RPHLPHS PSTQQPA 


210 



PVLLPLVORR FPKKWERPVA ESLGHKDKDQ 
LESSRQVTS E SQEEPHEEEF GREMRRQLHL 
QEKLRQWHLE DLAREQQRRW VQLEKEQESP 
RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 
EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 
DKEAQRKNLQ LL5EESELRL PHYLRSKALE 
I rY RRLQS LR QEAI NHVQIM KETEASYKAQ 
T DKQKG LEEK HRECLSSMVT MFPKLQLEWN 
ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 
AIEKKT PASL PRDQLRGHPD I PRLLTLDV 

BLASTP hits 
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E EQL + + E R R L ♦ E L+ + + B R + +++ 

Sbjct: 1102 EE EQLLQERE EE R LRRQERARKLRE EE -QL L RRE EQL LRQERDBK rR EEEQLLQES E EE R 1160 

Query: 211 LGKQ RPMS5 VEFT Y RPRTRRVPTK PKKS AS FPVTGTS I RRLTWPSLQI 5 PANIKKKV 267 

L +Q R + E ♦ B ♦ ♦ + ♦ + «♦ Q ♦♦♦+ 

Sbjct: 1161 LRRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQERARKLREEE 1220 

Query: 268 YHMDMEAQ BKNLQLLS-EESELBLPHYLRSKALELTTTTMELGALRLOYL 316 

+ E Q R+ QLL EE ELB ♦ + E E LB Q 

Sbjct: 1221 QL L RQEEQE LRQERD RK FREEEQL L R R E EQE LR RE RDRK FR EE EQL LQERE EER L R BQE R 12B0 

Query: 317 CHK Y I F YRRLQS LRQEA I N HVQIMKETEAS Y KAQNL Y I FLENI 0 RLQ- S L RLQAWT DKQK 375 

K ♦ L E ++ +E + Y+A+ + E RL+ LR + + + + 

Sbjct: 12B1 ARK — LREEEEQLLFEEQEEQRLRQERDRR YRAE EQFAREEKSRRLERELRQEEEQRRRR 1338 

Ou«ry: 376 GLEEKHRE 383 
E X RE 

Sbjct: 1339 ERERKFRE 1346 

Score - 109 (16.4 bits), Exp;ct - 1.9e-01, P - 1.7e-01 
Identities - 37/113 (32%), Positives - 60/113 (531) 

Query: 67 KQLS LESS RQVTS ESQ — EEPHEEE FGREMRRQ LK LEEEEKHQQRQKKW A LLEQEH QEK L 124 

♦QL E R+ E Q +E EE R+ R + EEE++ Q+R*++ L QE + KL 
Sbjct: 764 QQLRRERDRK FRE EE QL LQEREEER LR RQERERKLRE E EQ L LQ EREE E - ALRRQERE RKL 822 

Query: 125 RQWH LEDLAREQQRRWVQLEX EQESP RRE PEQLGEDVERRI FTPTS RHRD LEKAE 179 

R+ EL +E + * + E+E RE EQL E+ + R B L + E 

Sbjct: 823 REE — EQLLQEREEEBLB- RQERERKLREEEQLLRQEEQEL-'RQERARKLREEE 672 

Score - 107 (16.1 bits). Expect - 3.0e-01, P - 2.6e-01 
Identities - 35/109 (32%), Positives - 61/109 (55%) 

Query: 71 LESS RQVT SESQEEPWE-EE FGREMR RQL — - H LEEEEMWQQRQKKWALLEQE HQEKLRQ 126 

L Q+ ES+EE +E ♦♦+RR+- ♦ EEE++ QtR+*+ L QE + KLR* 
Sbjct: 742 LREEEQ LLQE5 EEERLRRQE R EQQL RRE RDRX FREE EQL LQE R E E E - RL RRQE RERK LRE 800 

Query: 127 WN LEDLAREQQRRW VQLEK EQES P RREP EQLG EOVERR I FT PT S RWRDL EKA E 179 

E L + E+ + + E+E RE EQL +♦ E B R L ♦ E 

Sbjct: 801 E - - EQ LLQ E RE E ERLR - RQERER K LR E EEQLLQE RE EERL RRQE RE R K LREE E 850 

Score - 104 (15.6 bits). Expect - 9.4e-02, P - 9.0e-02 
Identities - B4/339 (2(%), Positives - 149/339 (43%) 

Query: 67 KQLS L E S S RQVTS ESQEEPWEEE FG REMRRQL - WLEEEEHWQQ RQK KWALLEQE — HQEK 123 

+QL E ++ *EE EE RE R++L +LEEEE Q+R+* L £♦ + + 
Sbjct: 451 RQLRAEERQEQEQRFBEE EEQRRE RRQE LQ FLE E EEQLQRRERAQQLQE E DS FQE D R 507 

Query: 124 L RQWNLED LAREQQR RWVQLEKEQ ES P RB EP EQLGEOVE-RRI FT PTS RWRDL 175 

B* ++ Q RW QL+*E + R +P EQL E+ E *R R B+ 

Sbjct: 508 E RRRRQQEQR PGQTW RW - Q LQEEAQRRRHT L Y A K PGQQEQLREE E E LQREKRRQE R ER E Y 566 

Query: 176 EKAELS L V PAP S RTQ5 AHQS RRPH L PKS P S TQQP A LGKQRPHSSVEFTYRPRT RRV 231 

+ EL + + R+ + Q+L + R+ E ♦ R RR 

Sbjct: 567 RE EE - K LQREE DEK RRRQ ERE RQ Y RE LEEL RQE EQ L - RD RK L REEEQLLQEREEE RLRRQ 624 

Query: 232 PT KPK KSAS FPVTGTS I RRLTWP5 LQI 5PANI K KKVYHMDMZAORK — -NLQLLSEE 285 

+ K + *R* L* + + + + + E +RK QLL E 

Sbjct: 625 E RERK LRE E EQ LLRQE EQ E L RQERERK L REE EQ LLRRE EQ E L RQE RERKLREEEQ LLQER 684 

Query: 286 SELRLPHYLRSKALE LTTTTHELCALRLQYLCHKYIFYRRL-QSLRQEAINHV-- 337 

E RL R++ L L ELR+L+ RR 0 LRQE ♦ 

Sbjct; 685 E E ER L RRQERARK LREEEQL LRQE EQE L RQERERK LREE EQL LRRE EQL LRQE R DRK LRE 744 

Query: 338 --QIMKETEASYKAQNLYIFLEKIDRLQSLRLQAWTDKQKGLEEKHRECL 385 

Q+++E+E + E +L+ R + + + + + * L+E* E L 

Sbjct: 745 EEQLLQESEEERLRRQ EREQQL RRERORK FR EEEQLLQE RE EERL 789 

Score - 103 (15.5 bits). Expect - 1.2e-01, P - l.le-01 
Identities - 42/152 (27%), Positives - 74/152 (48%) 

Query: 36 ER P V AES LGHKDKDQED YFQKGG LQI KFHCS KQLS LESS RQVTS ESQEEPHEEEFG -REM 94 

ER + K +++E + ♦ + + + + L E ♦ + E QE E + RE 

Sbjct: 835 ERLRRQE RERK LRE EEQL LRQEEQELRQERARK LR - EE EQL LRQEEQELRQE RDRKLREE 893 

Query: 95 RRQ LWL E EEEMWQQRQK KW A L LEQEH QE K LRQWN L EOLAREQQ RRWVQ-LEKE 146 

+ L EE+E+ Q*R + K LL++ +E+LR+ E RE*+ RR Q I *I 

Sbjct: 894 EQLLRQE EOEL ROE RDRK LRE EEQL LQES EE E RLRRQE R ERK LREEEOLLRREEQELRRE 953 

Query: 147 QESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 
+ RE EQL + + E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score - 103 (IS. 5 bits), Expect - 7.8e-01, P • 5.4e-01 
Identities - 31/91 (34*1 , Positives - 52/91 IS7\) 

Query: 67 KQLS LES S RQVT S E S QE E PW EE E EGR EMRRQLW LEEEEMWQQRQK K W ALL EQEHQ £ KLRQ 126 

++L E R++ E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 
SbjCt: 642 QE LRQERE RKL REEEQL L RREE QE LRQERERK L REEEQL LQE REE E- RL RRQERARKL RE 700 

Query: 127 WN L E DLAREQQRRW VQL EK EQE 5 P RRE PEQL 157 

E L R++++ +L +E+E RE EQL 
Sbjct: 701 E— EQLLRQEEQ EL RQE RE RKL REE EQL 726 

Score - 101 (15.2 bits), Expect « 2.0e-01, P - 1.8e-01 
Identities - 38/111 (34%), Positives « 57/111 (51*1 

Query: 72 ESS RQVT S ESQEEPW EE - E FG REHRRQLW LE E E EMWQQRQK KW A L L EQE HQE KL RQWN L E 130 

E R++ E Q EE E RE R+L EEE+ + Q+R+++ L QE KLR+ + 
SbjCt: 931 ERERKLREEEQLLRREEQELRRERARKL- REEEQL LQEREEE- RLRRQERARKLRE EE -Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ++ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERD RK FRE EEQ LLQERE EERL RRQERD RX FREEE RQ L 1035 

Score - 101 (15.2 bits). Expect - 1.3e+00, P - 7.2e-01 
Identities - 33/108 (30%), Positives - S6/108 (51%) 

Query: 72 ES S RQVT S ESQ EEPW EEE FGREMRRQ LM L EE E EKWQQRQK KW AL L EQE HQE KLRQWN LEO 131 

E R+ + E Q EE+ R+ R + EEE + + +Q ++ + L QE KLR+ E 
Sbjct: 841 ERERK L RE EEQ LLRQE EQ E L RQERA RK L RESEQ LL RQE EQE LRQERDRKLREE--EQ 895 

Query: 132 LAREQQRRWVQLEK EQES P RRE PEQLG E D VE RRI FT PT SRW RDL E KAE 179 

L R++ + + +L ♦ £+ + RE EQL ++ E R R L + E 

SbjCt: B96 LLRQEEQ ELRQE RDRKLRE EEQLLQE S E E ERLRRQ ERE R K L R EE E 940 

Score - 99 (H.9 bits), Expect - 2.0e+00, P - 8.7e-01 
Identities - 32/97 (32%), Positives - 50/97 (51%) 

Query: 72 ES S RQ VTS E S QEE PW EE E FGREMRRQLWLEE E EMWQQRQKKW AL LEQEHQE KLRQWN LED 131 

E R+ E Q EE E R L EEE Q ++ + L QE + KLR+ E 
Sbjct", 578 E KRRRQERE RQ Y RE LE E LRQE EQL RDRK L RE EEQL LQE RE E E R L RRQE RER K LRE E - - EQ 635 

Query: 132 LAREQ QRRW VQL EKEQES P RRE PEQLGED V E RR I 165 

L R++ Q R +L +E+ + RRE + + L *+ ER++ 

SbjCt: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Scote « 99 (14.9 bits), Expect - 2.0e+00 r P » 8.7e-01 
Identities - 34/111 (30%), Positives - 58/111 (52%) 

Query: 67 KQLSLESS RQVT5 ESQ- -EEFWEEEFGREMRRQLWLEEEEMWQQKQK KWALLEQEHQEKL 124 

++L. E R+ + E Q +E EE R+ R + EEE+ + +Q + + + L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERE RX L 720 

Query: 125 RQWN LED LAREQQRRWVQLE K EQES P RRE P EQLG E D VERRI FT PT S RWRDLEK 177 

R+ + L RE+Q L +E+ + RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQER EQQLRR 768 

Score - 9B (14.7 bits), Expect - 2,6e+00, P - 9.2e-01 
Identities - 37/146 (25%), Positives - 77/146 (52%) 

Query: 20 E P V LL PL VDRRFP KKWE RP V AES LGH KDK DQED YFQK GGLQ I KFH C S KQ LS LE S S RQVTS 79 

E LL + + ++ ER + E + + E+ + + K +QL + + + + 

SbjCt: 655 EEQLL RR EEQELRQERE RKL REE EQLLQEREEE R LRRQERARKLREEEQ LLRQE EQE L RQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLW LEE E EMWQQRQKKW ALLEQEHQE KLRQWN LED- LAREQQR 138 

E + + EEE + + RR+ L +E + + +++ LL+ + +E+LR+ E L RE+ R 
SbjCt: 715 ERE RKL R EEE — QLLRREEQ LLRQ ERDRK L R EE EQLLQES EEE RLRRQ EREQQLRRE RDR 772 

Query: 139 RWVQLEK EQE SPRRE P EQ LG - ED VERR I 165 

♦+ E+EQ RE E+L ++ ER+ + 
Sbjct: 773 KF — REE EQL LQ E RE EERLRRQE R ERKL 798 

Score - 97 (14.6 bits), Expect - 3.3e+0O, P » 9.6e-01 
Identities - 38/129 129%), Positives - 63/129 (48%) 

Query: 72 ESS RQVT SESQ — EEPWEEEFG REMRRQLW LEE E EHWQQRQKKWALLEQ EKQE K L RQWN L 129 

E R++ E Q +E EE R+ R + EEE+ + +Q +++ L QE KLR+ 
Sbjct: 817 ERE RK LREEEQLLQE REEERL RRQ ERERKL REE EQ LLRQEEQE L RQERARKL R E E — 871 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRT 189 
E L R+ + + + +L +E+ + RE EQL E + + R R L + E L+ 
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5b jet: 872 EQLLRQEEQ ELAQE R DRK LREEEQL LRQEEQEL RQE RD RKLREEE - QL LQC S EEC 925 

Query: 190 QSAHQSRRFKL 200 

* OR L 
Sbjct: 926 RLRRQERERKL 936 

Score - 96 (14. 4 bits), Expect - 4.1e+00, P - 9.8e-01 
Identities - 41/132 (31»), Positive* » 69/132 (52%) 

Query: 46 KDKDQEDrFQKGGWI-KFHCSKQlSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 104 

♦+ + QE F + Q* + **<3l ESQ E * E+ G* R QL +EE 
Sbjct: 473 RE RRQE LQ FL EEEEQLQ RRERAQQLQE E DS FQ E DRERRR RQQEQRPGQTW RMQ L QEE 529 

Query: 105 MWQQRQ K KW ALLEQEH QE K LRQWN LE DLAREQQRRWVQL EKEQ ES P RREP EQLGED VE RR 164 

++R +A Q QE+LR* E+L RE++R+ E+E+E E O ED +RR 
Sbjct: 530 AQRRRHTLYAKPGQ--QEQLREE--EELQREKRRQ EREREYREEEKLQREEDEKRR 581 

Ouery: 165 IFTPTSRWRDLEK 177 
++R+LE* 

Sbjct: 582 RQERERQYRELEE 594 

Score - 96 (14.4 bits), Expect - 4.1e*00, P - 9.8e-01 
Identities - 35/138 <2S%), Positives - 76/138 155%) 

Query; 28 DRRFP It KWER PV A ES L -GH KD KDQE D Y FQKGGLQ I K FHC S KQL S LE S5 RQVT S ESQE E Ptf 86 

+ R++ + E ELK ++ + E Q+ ♦ ++ L Q+ + ++E 

SbjCt: 586 ERQ Y RE L E ELRQE EQLRDRKLREE EQLLQEREEERLRRQ ER ERKLREEEQL LRQEEQE- L 644 

Query: B7 EEE FG REMRRQLW L E E EEMWQQRQK Ktf ALLEQEHQ CKLRQWN LEDLAREQQRRWVQL 143 

+E R*+R ♦ L EE+E+ Q+R++K L *E Q L++ E L R+++ R +L 
Sbjct: 645 RQERE RKLREEEQLLRREEQE LRQERERK - — LREEEOr L LQEREEERLRRQERAR — KL 698 

Query: 144 EK EQES P RRE PEQLGED VERR X 165 

+£♦ + R+E * + L + * ER+* 
Sbjct: 699 REEEQ L LRQEEQE LRQ ERE RK L 720 

Score - 95 (14.3 bits>, Expect - S.2e*00, P - 9.9e-01 
Identities - 59/282 (20*1, Positives - 121/282 (42%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGWIKFHCSKQLSLESSRQVTS 79 

E LL ♦ ♦ ++ ER + E + +E+ * + K +QL + ♦++ 

Sbjct: 655 EEQLLRREEQEL RQ ERERK L REE EQL LQE REEE RLRRQ ERARKLREEEQL LRQE EQ £ LRQ 714 

Query: BO ESQE E PWE E E FGREMRRQLW L E EEEMWQQRQK KWAL L EQEH QE K L RQKNLE D - LAREQQ R 138 

E * + EEE + +RR+ L +E ++ ♦ + + LL++ +E+LR+ E L RE* R 
Sbjct: 715 ERE RK L RE E E — QLL RREEQLLRQE ROR K L REEEQL LQE S E E ERLRRQEREQQLRRERD R 772 

Query: 139 RWVQLEKEQ E S P RRE P EQLG - CO VERR I FT PT S RttRDL E KAE LS LV P A P S RTQS AH Q — S 195 

♦ ♦ E+-EQ RE E + L + + ER+ + +♦ E+ L + + Q 

Sbjct: 773 KF--REEEQL LQERE E E R LRRQERER KLR EE EQLLQE R E EE RLRRQERE RKLREEEQLLO 830 

Query: 196 RHPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 255 

R ♦ + + L ++ + E R R * + +R+ 

Sbjct: 831 E RE E E RL RRQERERK LREEEQL LRQE - EQ E L RQ ERARK LRE E EQL LRQEEQELRQE RD RK 889 

Query: 256 LQISPANIKKKVYHNKIEAQRK--- NLQLLSEESELRLPHYLRSKAL 299 

L+ + + + + + E RK QLL E E RL R + L 

Sbjct: 890 L REE EQL LRQEEQE L ROE RORKLREEEQLLQES EEERL RRQE RE RKL 936 

Score - 94 (14.1 bits). Expect - 1 . le+OO, P - 6.8e-01 
Identities - 35/116 (30%), Positives - 59/116 (50%) 

Query: 72 ESS RQVT SESQEEPWEEE FG REMRRQLW L EEE EKWQQRQ KKW ALL EQEH QEK L 124 

E *R* + E Q EE* R* R ♦ ♦ EEE*+ Q+R+++ L QE K L 
Sbjct: 977 ERA RKLREEEQLLRREEQE LRQERDRKFREE EQLLQEREEE - RLRRQERCRK FREEERQL 1035 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQES F RRE PEQLGED VERRI FTPTSRMRDLEKAELSL 182 

R+ LE* R+++ R *LE EQ +E *QL R F + R ++ E L 

Sbjct: 1036 RRQE L EEQ FRQERDRK FR LE - EQ I RQEK E EKQLRRQER DRK FREE EQQRRRQEREQQL 1092 

Score - 94 (14.1 bits), Expect - 1. le+OO, P - 6.Be-01 
Identities - 51/166 (301), Positives - 76/166 (45*) 

Query: 67 KQLS L ES SRQ VT S E SQ- - EE PWEEE FG REMR - RQLWLE E E EHNQQ RQK KWALLEQEHQE K 123 

♦+L E R+ E Q +E EE R+ R R*L EEE+ + + Q++ L QE+ 
Sbjct: 1250 QELRRERDRX FRE E EQLLQERE EER L RRQE RARK LREE E EQL L FEEQE EQRL RQER 1305 

Ouery: 124 LRQWNLEO-LAREQQRRHVQLEKEQESPRREPEQLGEOVERRIFTPTSRHROLEKAELSL 182 

R*+ E+ ARE+ + R +LE+E R+E EQ R T R E+ E 

Sbjct: 1306 DRR Y RAE EQ FAREEK S R RLE RE L RQEE EQ RR RRERER K F REEQ LRRQOEE - EQRR 1359 
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Query: 183 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 232 

R QSRR L P T*Q ARE* R++ P 
Sbjct: 1360 RQLRERQFREDQSRRQVL--EPGTRQFARVPVRSSPLYEYIQEQRSQYRP 1407 

Score - 93 (14.0 bits), Expect - 8.3e+00, P - l.Oe+OO 
Identities - 41/145 (281), Positives - 72/145 (49%) 

Query: 28 DRRFPKKWERPVAES LGHKDKDQEDYFQKGGLQIKFHCSKGLSLESSRQVTSESQEEPW- 86 

+RB t+CR+E L R + QE+ + 

Sbjct: 408 E RRQRQE R£ RE L EEQARRQQQM QAEE ES ERRRQ- RLS ARP5 L RE RQLRAEE RQEQEQRFR 466 

Query: B7 - EEE FG REKRRQL- W LEEE EMWQQRQKKW A L LEQE — H Q£ K LRQWN LEDLAREQQ RRWVQ 142 

£EE RE R+*L +LEEEE O+R** L E++ R+ ++ 0 RW Q 

Sbjct: 467 EEEEQRRERRQE LQ FLEE EEQLQRR ERAQQLQEE DS FQEDRERRRRQQEQR PGQTWRW - Q 525 

Query: 143 LEKEQESPRR EP— EQLGEDVE 162 

L**E + R + P EQL E+ E 

Sbjct: 526 LQ EEAQ RRRHT L Y AK PGQQEQLRE EEE 552 

Score - 91 (1J.7 bits), Expect - 2.4«+00, P - 9.1«-01 
Identities - 38/1X0 (34%), Positives - 57/110 (Sl%> 

Query: 72 ESS RQVT S ESQE E PW EE - E FGREMRRQLWLEEE EMWQQ RQK KM A LLEQEHQEKLRQWN L - 129 

E R++ E Q EE E RE R+L EEE + + Q+R+++ L QE KLR+ 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 968 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAEL 180 

+ + L + E+ R+ + E+EQ RE E+L R F R L + EL 

Sbjct: 989 LRRE EQELRQERDRK F- - REEEQLLQERCE ERLRRQERORK FREEER- -Q LRRQ EL 1040 

Score - 89 U3.4 bits). Expect - 2.2e+00, P - 8.9«-01 
Identities - 35/138 (25%), Positives - 65/138 (47%) 

Query: 82 QE E PWEEE FGREK RRQLW LEEEEM - - WQQRQ K KWALL EQEHQEKL RQWNL EDLAREQQRR 139 

Q E++ E+R + + +E E WQ»++++ L E+E Q K R+ + + R+ + + 
Sbjct: 111 QH RRQEDORR FELRD RQFED E PERRRWQKQEQERE LAEEEEQRKKRERFEQKYS RQY RDK 170 

Query: 140 WVQLEKEQ-ESPRREPEQL GEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ 194 

+L++++ ERE EQL GDEF +RE+EL 0+ 
Sbjct: 171 EQRLQRQELEERRAEEEQLRRRKGRDAEE--FIEEEQLRRREQQELKR-ELREEEQQRRE 227 

Query: 195 S RRPHLPKS PSTQQPALGKQR 215 

R H ♦+ L ++R 

Sbjct: 228 RREQHERALQEE EEQLLRQRR 248 

Score - 50 (7.5 bits). Expect - 2.2e+00, P - B.9e-01 
Identities - 34/160 (21%), Positives - 67/160 (41%| 

Query: 325 RLQSLRQEAINHVQIMKETEASYKAQNLYIFLENIDRL-QSLRLQAWTOKQKGLEEKHRE 3B3 

R + R+E Q+tEE + + LE + R Q LR + ♦♦♦♦ E+ + R 

Sbjct: 245 RQRRWRE E P REQQQLRRELEE 1REREQR LEQEE RREQQLRREORLEQEERREQQ LRR 301 

Query: 384 CL5SMVTMFPICLQLEWNVHLNI P- EVTSPK PKKCKLPAASPRHI RPSG PT YKQP FLS RHR 442 

L * +L+ E + E ♦ K +L R R ♦+ L+ 

Sbjct: 302 E LEE I RE REQRLEQ EERREQ RL EQEERREQQLK RELEE I REREQ R LEQEE RREQLLAEE V 361 

Query: 443 ACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASLPRDQ 484 

+ AR++G+ ♦ tl+ ++ S + A + K S PR Q 
Sbjct: 362 R EQ AREAGES LTRRMQRQL ES EAGARQS KV - YS RPRRQ 398 

Score - 40 (6.0 bits), Expect - 1.9e-01, P - 1.7e-01 
Identities - 32/115 (27%), Positives - 47/115 (40%) 

Query: 276 RKN LQL LS EES EL R L PH Y L R5 KA L - - E LTTTTMELG ALRLQ YLC H K Y I FY RRL - QS LRQE 332 

R* QLL E E RL R+ + L E E LR Q K+ +1 Q +E 

Sbjct: 959 REE EQLLQEREEE RLRRQERARK LREE EQLLRREEQELR -QERDRK FREEEQLLQEREEE 1017 

Query: 333 A I K HVQ I MK ETEA5 Y KAQN LT1-FLEHI DRLQS LRLQAHT DKQ- KGLE EKH RE 383 

♦ + +E E ♦ Q L F ♦ DR L O +K> K L + R+ 

Sbjct: 1018 RLRRQERORK FRE EERQ L RRQE LEEQ FRQE RDRKFRL EEQ I ROE KEEKOL RRQERD 1073 

Score - 37 (5.6 bits), Expect - 1.6e+00, P - 7.9e-01 
Identities - 27/108 (25%), Positives - 43/108 (39%) 

Query: 276 RKN LQLLS EES E L RL P H Y LRSKAL - ELTTTTKE LGALRLQY LC HK Y I F Y RRLQS LRQE 332 

Rt QLL E E RL R + L E E LR Q K R + L QE 

Sbjct: 775 REEEQLLQERE EERLRRQERERKL REEEQL LQERE E ERL RRQE R E RKL -• — REEEQLLQE 831 

Query: 333 AINHVQINKETEASYKAQHLYIFLEKIDRLQSLRLQAWTDKQKGLEEKHRE 383 

+E E ♦ E L* R + + +++ L t + + E 

Sbjct: 832 RE EE RL RRQE RERK LREEEQLLRQEE -QE LRQ ERAR KLRE EEQL LRQEEQE 881 
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P*d*nt information. Cor DKFZphtes3_S0n23, frame 1 



R»port for DKFZpht«s3_50n23.1 

[LENGTH] 199 

|MW] 58885. 69 

[pi] 9.67 

[KM) All Alpha 

[KWJ LOtTCQMPLEXITY 10.42 \ 

SEQ HTVR5RVAOVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQE0YFQKGGLQ 

SEG 

PRO ccccccc»e«ccccccccceee«ccccccccccccchhhhhhhcccccccccccccccce 

SEQ I K FHC 3 KQ LS LESS RQVT S E SQEE PWEE E FG REM RRQLW L EE £ EMWQQRQK KHALLEQEH 

SEG xxxxxxxxxx . .xxxxxxxxxxxxxxxxxxx 

prd tttecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEK LRQWN LE D LAREQQRRWVQLEKEQES p RREPEQLGED VERR I FT PTS RW RD LEKAE L 

SEG 

PRD hhhhhhhhhhhhh^hhhhhhhh^hhcccccccccccccccccee«ccccccchhhhhhhh 

SEQ S LVPAPSRTQS AHQSRRFHLPNS PSTQQPALGKQRPMSSVE FTYRPRTRRV PTKPKKSAS 

SEG xxxxxxxxxxxxxxx. . . 

PRO hccccccchhhhhccccccccccccccccccccccccc«*««««cccccccccccccee« 

SEQ rPVTGTSIRRLTWPSLQISPAHIKKKVYKMDHEAQRXNLQLLSEESELRLPHYLRSKALE 

SEG xxxxxxxx 

PRD *cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LTTTTMELGA L R LQ Y LCHK Y I FY RRLQS LRQEA I NH VQI MKETEAS YKAQNL Y I FLEH I D 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhJihhhhhhhhhhhhhhhhhhhhhhhhnhhhhhhhhhh 

SEQ RLQS L RLQA WT DKQK G LE EXH RE C LS SMVTMFP K LGL EWN VHLN I PEVTS PK PKKCKLPA 

SEG ; • • 

PRO hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccccccccccc 

SEQ ASPRH I RPSGPTYKQP FLS RHRAC V P LQHARQQG KQMEA VWKT EVAS S S Y A I EKKT PAS I 

SEG 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccc 

SEQ PRDQLRGHPDIPRLLTLDV 

SEG 

PRO ccccccccccccccccccc 

(Ho Prosit* data available for DKF2phtes3_S0n21.il 
(Mo Pfam data available for DKFZphte»3_50n23 . 1) 
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DKFZpht«3_6b21 



group: testis derived 

DKFZpht«s3_6b21 encodes a novel 791 amino acid protein without similarity to human KIAA0256 
gene product. 

No informative BLAST results; No predictive prosite, pram or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-spacif ic 
genes . 

similarity to KIAA0256 

complete cdna, complete cds, EST hits 

Sequenced by BMfZ 

Locus: /map-"356.3 cB from top of Chr9 linkage group" 
Insert length: 3360 bp 

Poly A stretch at pos. 3314, polyadenyletion signal at pos. 3300 

1 GGCAAGCCGA CCGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 

51 CTCGCGGCAT GCCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 

101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 

151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TCTCTTCCCC AGCTCTGCAG 

201 CC AC AT ACT A TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 

251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 

301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 

351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA TGATGAGAAA AAAACGTATG 

401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 

451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 

501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 

551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 

601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 

651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 

701 TAAGACAAGT ACTAAAACCA GCTGCAGTGT TATCAAAGGC TGAAATAGTG 

751 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 

801 TCCTTCATGT ACAAGAGAGT TATCTTGGAC ACCAATGGGT TATGTTGTTC 

851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 

901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 

951 TATACCATCT TCTGAACCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 

1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 

1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 

1101 TACATCAAAA TATGAAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 

11S1 ATGCCGAGCA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAGAGACAGA 

1201 ATAGAGACAC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 

1251 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 

1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 

1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 

1401 ATGTCCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 

1451 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 

1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA CTGAAGAAGA TTATTTTGAA 

1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 

1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 

1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 

1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 

1751 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 

1901 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 

1B51 TAAAGAAGTC GATGCTTCTG TTACCGACCT ACTCAAAGAA CTGGTCCGTT 

1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 

1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 

2001 ACTGAAATGT GTCATTATTT CTCCCAACTG TGAGAAGATA CAGTCAAAAG 

2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 

2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 

2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TAT6ATGGGG 

2201 CCCAGGATCA GTTCCACAAG ATCGTTGAGC TGACAGTGGC GGCCCGACAG 

2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 

2301 CAGGCCTCAC GCACCTCCCA CCCTACCCAC ACAGGGCCCC AGCTCCCCTG 

2351 CAGAAGATGG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 

2401 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATCTA CCCTGGAGCT 

24 51 AGAAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 

2501 GAGAGTTCTT GCCTGTCTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 

2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 

2601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 

2651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATCTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GCCTTTCTGA 

2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTCATACT TAGAATACTT 

2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATACCTG AGCCAAGTGA 

28 SI GTGAGTTTCC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 

2901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 

2951 GGCACAAGGG TGCACCGCTC CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 

3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 

3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 

3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 

3151 AATGTGTATT GAACTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 

3201 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 

3251 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 

3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 J 



BLAST Results 



Entry HS773347 from database EMBL: 
human STS WI-1816Q. 
Score - B13, P - 2.9e-30, identities 



Peptide information for frame 1 



ORF from 157 bp to 2499 bp; peptide length: 
Category: similarity to known protein 



1 MVRVLRSKCL PQLCSHILSV CSCTTSDRNV YSVPGSQYLY NQPSCYRCFQ 
51 TVKHRNEHTC PLPQEHXALF KKKTYDEKKT YDOQKFDSER ADGTISSEIK 
101 SARGSHHLSI YAENSLRSDG YHKRTDRKSR IIAKNVSTSK PEFEfTTLOF 
151 PELQGAENtJM 5EIQKQPKWG PVHSVSTDIS LLREVVKPAA VLSKGEIVVK 
201 NNPNESVTAN AATNSPSCTR ELSwTPHCYV VRQTL5TELS AAPKKVTSMI 
251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHI IHPTQK SKASQGSDLE 
301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 
351 TPKFQSKQQP QDNFKNNVKK SQLPVQLDLG GMLTALEKKQ HSQHAKQSSK 
401 PWVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LHKKGKQREI 
451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSODTQD GESGGDDQFP 
501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPKHTTFPK 
551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQORMYQKD PVKAKTKRRL 
601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT IIDYACEQWI 
651 PFVFALKRKA LGRSLNKAVP VSWGIFSYD GAQOQFHKMV ELTVAARQAY 
701 KTHLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHY1EI 
75 1 WKKHLEAYSG CTLELEESLE ASTSQMMNLM L 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for OKFZphtes3_6b21, frame 1 

SWISSPROT:Y2S6 HUMAN HYPOTHETICAL PROTEIN KIAA0256., » - 1, Score - 
766, P - 3.6*-78 

TREMBL:PFMAL3P3_15 gene: "MAL3P3 . 15" ; Plasmodium falciparum MAL3P3, M 

- 2, Score - 161, P - 3.1e-10 

TREMBL:RNNFLH 1 Rat heavy neurof ilajoent subunit (NF-H) mRNA, 3* end.. N 

- 1, Score - T50, P - 9.1e-07 

>SWISSPROT:Y256_HUNAN HYPOTHETICAL PROTEIN KIAA0256. 
Length - 63 S 
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Query: 


369 


Sbjct: 


16 


Query: 


427 


Sbjct: 


76 


Query: 


486 


Sbjct : 


134 


Query: 


542 


Sbjct: 


193 


Query: 


601 


Sbjct: 


253 


Query: 


661 


Sbjct: 


313 


Query: 


71B 


Sbjct: 


373 


Query: 


767 


Sbjct: 


431 



PVQLDLG ML ALEK+Q « 



KKGK++EI K K+PT+LKK+ILKER+E+K RL 



+ P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+YQKDPV+AK +RRL 



VfCLREV KH + KL K+KCVI ISPNCEKIQSKGGLD+ L+ +1 A EQ IPFVFAL RKA 



LCR «-NK VPVSWGIF+Y GA+ F+K+VELT AR+AYK M+- ++QE 



710 QAPP5LP-TQG PS-- 



> -CPAEDGP PALKEKEEPHYI E I WK KHIEAYSGCTL- ■ 



Pedent information (or DKFIphtes3_6b21 , fn 



Report foe DKF2phtea3_6b21.1 



I LENGTH) 

[KW| 

Ipll 

IH0HOL) 

[PROSITE) 

(PROSITE) 

(PROSITE) 

[PROSITE] 

[PROSITE1 

[PROSITE} 

(PROSITE) 

[KW| 

[KWJ 



781 

87393.44 
B. 94 

SWISSPROT:Y256 HUMAN HYPOTHETICAL PROTEIN KIAA0256. 

MYRISTYL "4 

AMI DAT I ON 1 

CAMP PHOSPHO SITE 3 

CK2_PHOSPHO SITE 16 

TYR PHOSPHO SITE 4 

PK(TPH0SPHO~SITE 16 

ASN~GLYCOSYLATION 6 

Alpha Beta 

LOW COMPLEXITY 9.4S I 



SEQ 
SEG 
PRD 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



HVRVLRSHCLPQLCSH I LSVCSGTTSD RNV Y S VPGSQ YL YNQ PSC YRG FQTVK H RK ENTC 

ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHKLSIYAENSLK5DG 

xxxxxxxxxxxx 

cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhhhhcccceeeeeeeecccccc 

YHKRTDRKSRI I AKNVSTSKPEFEFTTLDFPELQGAENNMS E I QKQPKWG PVHSVSTDI 3 

ccceechhhhhc««ccccccccceeecccccccccccehhhhhhcccccecccee«cchh. 

LLREWKPAA VLSKGEI WKNNPN ESVTANAATNS PSCTRELSWTPMGYWRQTLSTELS 

hhhhhhheeeeecccceeeeccccceeeeeececccccceeeeeecceeeeeeccccccc 

AAPKNVTSMINLKTIASSADPKNVSIPSSEALSSDPSYNKEKHIIHPTQKSKASQGSDLE 

ccccceeee e rthhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

QNEASRXNXKKKEKSTSKYEVLTVQEPPRIEOAEEFPNLAVASERRORICTPttFOSKQQP 

. . . , xxxxxxxxxxxxxx 

hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

QDN FKN N V K K SQL P VQLD LGGMLT ALEKKQH SQHA KQS S K P V W S VG A V P VLS K EC ASG E 

xxxxxxxxxxxxxxxxx 

cccccccccccceccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeeeccccc 
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SEQ RGRRMSQMKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKI ILKEBQERKQRLQENAVS 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhWihhhhhcc 

SEQ PAITSDDTQDGESGGDDQFPEQAELSGPEGHDELISTPSVEDKSEEPPGTELQRDTEASH 

SEG 

PRD ccccccccccccccccccchhhhhhcceccceteocccccccccccccccccccccccee 

SEO LAPN HTT FP K I H S RRFRDYC SQMLS K EV D AC VT D L LKELVRFODRH YQK D PVK AKT K RRL 



PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhh 

SEO VLGLRE VLKHLKLK KLKCV 1 1 S PMCEK I QSKGGLDDT LKTI I DYACEQN I PFV FALNRKA 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhh««« e«cccccccccccccchhhhhhhhhhhhcccce e ••ccccc 

SEQ LGRS LN KA V PVS VVG I FS Y DG AQDQFH KMVE LTVAARQ A YKTM L EN VQQE LVG E P R PQA P 



PRD ccccccccee«««Be*«cccccchh(ihhhWhhhhhf»hhhhhhhhhhhhhhhhhcccccc 

SEQ PSLPTQGPSC PAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 

5EQ L 

SEG 

PRD c 



Proaite for DKFIphtes3_6b21. 1 



PS00001 


135- 


>139 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


159- 


>163 


asn"glycosylatioh 


P DOC 00001 


PS00001 


204- 


■>2oa 


asn'glycosylation 


POOC00001 


PS00001 


245- 


>249 


asm'glycosylatiqh 


PDOC00001 


PSO00O1 


263- 


■>267 


ask'glycosylation 


PDOC00001 


PS000O1 


544- 


■>54fl 


ash"glycosylatiom 


PDOC00001 


PS00004 


71 


->7S 


CAHP PHOSPHO SITE 


PDOC00004 


PS000O4 


423- 


>427 


CAMP~PHOSPHO~SITE 


PDOC00004 


PSOC004 


454- 


>45fl 


CAHP~PHOSPHO~SITE 


PDOC00004 


PSOOOOS 


2i 


i->29 


PRC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


51 


->54 


PKC~PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


at 


l->91 


PRC PHOSPHO - SITE 


PDOC00005 


PSOOOOS 


101- 


>104 


PRC PHOS PHO~ SITE 


PDOC00005 


PS00OO5 


115- 


>118 


PRC~PHOSPH0 SITE 


PDOC00O05 


PS00005 


125- 


>128 


PKC~PHOSPH0 SITE 


PDOC00O05 


PSOOOOS 


138- 


>141 


PRC PHOSPHO~SITE 


PDOC0000S 


PSOOOOS 


288- 


>291 


prcTphosphcTsite 


P DOC 0000 i 


PSOOOOS 


305- 


■>30S 


PIX~PHOSPHO SITE 


P DOC 00005 


PSOOOOS 


316- 


■>319 


PRC _ PHOSPHO"*SITE 


PDOC00005 


PS00005 


343- 


>346 


PRC~PHOSPHO~SITE 


PDOC0OOO5 


PS00005 


351- 


>354 


PKC~PHOSPHO SITE 


P DOC 0000 S 


PSOOOOS 


398- 


■>401 


PRC~PHOSPHO SITE 


P DOC 00005 


PSOOOOS 


458- 


>461 


PRC PHOSPHO~SITE 


PDOC00005 


PSOOOOS 


553- 


>5S6 


PRC~PHOSPKO~SIT£ 


PDOCQO0O5 


PSOOOOS 


596- 


>599 


PKC~PHOSPKO~SITE 


PDOC00005 


psooooe 


2A 


l->28 


CR2~PHOSPHO~SITE 


PDOC00006 


PS00OO6 


li 


l->78 


CK2~PHOSPHO SITE 


pDocooooe 


PS00006 


139- 


>143 


CK2~PHOSPHO SITE 


PDOC00006 


psooooe 


146- 


■>150 


CK2~PH0SPH0~SITE 


PDocooooe 


psooooe 


193- 


■>197 


CK2~PH0SPH0 SITE 


PDOC00006 


psooooe 


257- 


■>261 


CR2~PHOSPHO~SITE 


PDOC00O06 


psooooe 


297- 


>301 


CK2~PHOSPHO""SITE 


PDocooooe 


psooooe 


317- 


>321 


CK2 - PKOSPH0 SITE 


PDOC00006 


psooooe 


323- 


■>321 


cr2"pkospho"site 


PDOC00006 


psooooe 


384- 


•>388 


CR2~PKOSPHO~SITE 


PDOC00006 


psooooe 


484- 


>4 88 


CR2~PKOSPHO~SITE 


P DOC 00006 


psooooe 


493- 


■>497 


cr2~phospho"site 


PDOC00006 


psooooe 


506- 


>510 


CR2~PHOSPHO _ SITE 


PDocooooe 


PS00006 


519- 


>523 


CK2~PHOSPHO~SITE 


PDocooooe 


P500006 


640- 


>644 


CR2~PHOSPHO~SITE 


PDOC00006 


psooooe 


702- 


>706 


CR2~PHOSPHO~SITE 


PDocooooe 


PS00007 


581- 


>588 


TYR~PHOSPHO SITE 


PDOC00007 


PSO0OO7 


740- 


■>748 


TYR~PHOSPHO~SITE 


PDOC00007 


PSC0007 


740- 


■>748 


tyr~phospho~site 


PDOC00007 


PS00007 


73 


l->82 


TYR — PHOS PKO_S ITE 


PDOC00007 


PSOOOOS 


93 


t->99 


MYRISTYL 


PDOC00008 


PSOOOOS 


155- 


>161 


MYRISTYL 


PDOCOOOOS 


PSOOOOB 


380- 


■>386 


MYRISTYL 


PDocooooe 
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PS00008 633->639 HYRISTYL PDOC00008 
PS000O9 <21->425 AMI DAT I ON PDOCC0009 

(No Pfara datm »v«ilabl« for DKFZpht*s3_6b21 . U 
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DKFZphte33_6cll 



group; signal transduction 

DKFZphtes3_6cll encodes * novel 1025 amino acid protein with similarity to A. ambisexualis 
antheridiol steroid receptor. 

The novel protein is a putative steroid receptor. It shares similarity with yeast VNL132w and 
contains the ATP/GTP-binding site motif A (P-loop) and RGD site, similar to the A. 
ambisexualis antheridiol steroid receptor. 

The new protein can rind application in modulating/blocking the expression or genes controlled 
by this receptor. 



strong similarity to YKL132w 

strong similarity to S.poobe/YOK9 SCHPO. S.cereviaiae/YHL132w, 
C.elegan3/F55A12.8 

Sequenced by BMrz 

Locus : unknown 

Insert length: 3966 bp 

Poly A stretch at pos. 3B90, polyadenylation signal at pos . 3873 



901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
20S1 
2101 
2151 
2201 
2251 
2301 
2351 
2«01 
2«S1 
2501 



GCTGTGCCTT 
CTCCACTGGC 
CATGCATCGG 
GAGTAGCTGA 
AAAGATCAGG 
GGCTCGGCCT 
GTCACCGGAA 
ACACTGAACA 
AAACATTCGC 
CCTTCGGCAT 
CTGGCCAGGA 
CCTACGGACC 
TGCATTCCAG 
AATGAAAGGT 
TGATGACCAG 
AGGCCCTGCC 
GAGCTGAGGG 
CTTGGTGGAC 
TTATCGAGGG 
GCTGCTCGAG 
GGCGGTGGCA 
ATAACCTCCA 
CAATATCAGG 
ATTTAACAAA 
CTATTCAGTA 
CTAGTTGTGA 
CCTACTTGGC 
AGGGCACTGG 
CAGAGCGCCC 
GACAGCCAGA 
AGTCAATCCG 
TTGCTGTGCC 
CTTGCCTGAA 
GCTACCACAA 
GTGGCTTCTC 
TGCACCTGCT 
AGAATGCCCT 
GAGATTTCTC 
TTCAGGGGAC 
ACTTTGGTGG 
GATTATCAAG 
GTACTATGAA 
CACAGGAAAT 
GTCATCACTC 
GAGGCCTGCC 
CCAGGCTCCT 
AGACAGACCC 
GACGCTCACT 
TCTGGAAAGA 
AGTACCTTCT 
GGCGAAGCCA 



CTCTTTCGGA 
TGGGATCCCC 
AAAAAGCTGG 
GCGGCAAAGA 
TGGTAATACT 
TCAGTGCTGT 
GAAAACAATG 
TAAAGCAGCA 
TACTGCTACT 
GTGTGTGCTG 
CTGTAGAAAC 
ATGAACTCAC 
GTACAGAACT 
TTATTCTGTC 
CTCAACATCC 
TCCCCAGACT 
AGTTGAAGGA 
TGCTGTAAGA 
CATCTCTGAA 
GACGGGGAAA 
TTTGGGTACT 
TACTCTGTTT 
AACATCTGGA 
GCAGTGATCA 
TATACATCCT 
TTGATGAAGC 
CCCTACCTTG 
CCGGTCACTG 
AGAGCCAGGT 
TTGGCATCAG 
ATACGCCCCT 
TGGATTGCCT 
GCTTGTGAAC 
GGCCTCTGAA 
ACTACAAGAA 
CACCATCTCT 
TCCAGAAGTG 
GCCAGTCCAT 
CTGATTCCAT 
TCTGTCTGGT 
CGATGGGCTA 
GGCAGGTTTC 
TCACACCGTA 
CCCGGAAGGA 
GAACGCCTGG 
CAAGTTCTGG 
CGAATGACCT 
GATGAGGATG 
TTTCCGACGG 
CTCCTTCCCT 
GCCCAGCCTG 



GTTCTTCCGT 
CGGGCTCGGG 
ATAACCGAAT 
TCTCTCTTTG 
TCATCACATG 
GGTCTTATAA 
CGACAGCTGC 
CGACCCCTTT 
ACAACGAGAC 
CAGGATTTTG 
AGTGGAAGGT 
TCAAGCAATT 
GAGGCCCATC 
TCTGGCCTCT 
TGCCCATCTC 
CCGGATGAGA 
GAGCTTGCAG 
CTCTAGACCA 
AAGACCCTGA 
ATCTGCAGCC 
CCAATATCTT 
GAATTTGTAT 
TTATGAGATT 
GAGTGAATGT 
GCAGATGCTG 
TGCCGCCATC 
TTTTCATGGC 
TCCCTCAAGC 
CAGCACCACT 
CGCGGACACT 
GGGGATGCAG 
CAACATCACT 
TGTACTATGT 
GTTTTCCTCC 
CTCTCCCAAT 
TCTGCCTTCT 
CTTGCTGTTA 
CTTGAACAGT 
GGACAGTGTC 
GGAAGGGTCG 
TGGCAGCCGT 
CTTGTCTGGA 
AGCAGCGAGG 
CCTGCCTCCT 
ATTACCTGGG 
AAACGAGCTG 
GACCGGAGAG 
AGGCTGACCA 
CGCTTCCTAG 
GGCTCTGAAC 
CCCTGAGCCG 



GCTCCCACGT 
GCGCAGTAAT 
CCGGATTCTC 
TTGTAGTTGG 
TTATCCAAAG 
GAAAGAGCTG 
AGAAGAAAAT 
GAACTCTTCA 
CCACAAGATC 
AAGCCTTAAC 
GGTGGGCTAG 
GTACACAGTG 
AGGATGTGGT 
TGTAAGAAGT 
CTCCCACGTT 
GTCTTGGTCC 
GACACCCAGC 
GGCCAAAGCT 
GGAGTACTGT 
CTGGGATTGG 
TGTTACCTCC 
TTAAAGGATT 
ATCCACTCTC 
ATTTCGAGAA 
TGAAGCTGGG 
CCCCTCCCCT 
ATCCACCATC 
TAATTCAGCA 
GCTGAGAATA 
GCATGAGGTT 
TGGAGAAGTG 
CGGATAGTCT 
TAATAGAGAT 
AACGGCTTAT 
GATCTCCACA 
GCCTCCTGTG 
TCCAGGTGTG 
CTGTCTCGAG 
AGAACAGTTC 
TTCGCATTGC 
GCTCTCCAGC 
GGAAAAGGTC 
CTGTCAGCTT 
TTACTCCTCA 
TGTTTCCTAT 
GATTTGTTCC 
CACTCGTGCA 
CGGAGGCTGG 
CCTTGCTCTC 
ATCATTCAGA 
GGAGGAGCTG 



GCTTCCCCTT 
AATTTTTCAC 
ATTGAGAATG 
GGATCGAGGA 
CAACTGTGAA 
GGGTTTAGCA 
AAAGAATGGA 
TAGCAGCCAC 
CTGGGCAATA 
TCCAAACTTG 
TGGTCATCCT 
ACTATGGATG 
GGGAAGATTT 
GTCTCGTCAT 
GCCACCATGG 
TTCTGATCTG 
CTGTGGGTGT 
CTCTTGAAAT 
TGCACTCACA 
CGATTGCTGG 
CCAAGCCCTG 
TGATGCTCTG 
TAAATCCTGA 
CACAGGCAGA 
CCAGGCTGAA 
TGGTGAAGAG 
AATGGCTATG 
GCTCCGTCAA 
AGACCACGAC 
TCCCTCCACG 
GCTGAATGAC 
CAGGCTGCCC 
ACCCTCTTTT 
GGCCCTCTAC 
TGCTCTCCGA 
CCCCCCACCC 
CCTTGAAGGG 
GCAAGAAGGC 
CAAGATCCAG 
TGTTCACCCA 
TGCTGCAGAT 
CTTGAGACAC 
GTTGGAAGAG 
AATTGAATGA 
GGCTTGACCC 
TGTTTATCTG 
TCATGCTGAA 
CTTGCACCCT 
CTACCAGTTC 
ACAGGAACAT 
GAACCACTCT 
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2551 TCCTCCCCTA TGACCTGAAG CGGCTGGAGA TGTATTCACG GAATATGGTG 

2 $01 GACTATCACC TCATCATGGA CATGATCCCG GCCATCTCTC GCATCTATTT 

2651' CCTGAACCAG CTGGGGGACC TGGCCCTGTC TGCGGCTCAG TCGGCTCTTC 

2701 TCTTGGGGAT TGGCCTGCAG CATAAGTCTG TGGACCAGCT GGAAAAGGAG 

2751 ATTGAGCTGC CCTCGGGCCA GTTGATGGGA CTTTTCAACC GGATCATCCG 

2B01 CAAAGTTGTG AAGCTATTTA ATGAAGTTCA GGAAAAGGCC ATTGAGGAGC 

2B51 AGATGGTGGC AGCGAAGGAT GTGGTCATGG AGCCCACGAT GAAGACCCTC 

2901 AGTGACGACC TAGATGAAGC AGCAAAGGAA TTTCAGGAGA AACACAAGAA 

2951 GGAAGTAGGG AAGCTGAAGA CCATGGACCT CTCTGAATAC ATAATCCGTG 

3001 GGGACGATGA AGAGTGGAAT GAAGTTTTGA ACAAAGCTGG GCCGAACGCC 

3051 TCGATCATCA GCCTGAAAAG TGACAAGAAA AGGAAGTTAG AGGCCAAACA 

3101 AGAACCCAA* CAGAGCAAGA AGTTGAAGAA CAGAGAGACA AAGAACAAAA 

3151 AAGATATGAA ACTGAAGCGG AAGAAATAGT GAAGAGAAAC TCGGGCATCT 

3201 GTGTTTGATC ATGGGAAGAT ACTCTCACTA ACT6AACCCT CTCTGGCTGG 

3251 ACTCTTAAAA GCAACGAGAG GCCCCGGCAC ACCTGGAAGC TGGCCGCGAA 

3301 TTCGGCCTCT GGGCCTGTGT GTCTGTGAGC TCAACCTGGC TAAAGGCAGA 

3351 GTCACTCCCA AATGGGTCTC TTTAGAACTT GATGGCTGGG CACTGCCATC 

3401 TCTAGAATTG CCACGAGTCT CTCTCTTCCT GCCCAGTCCA GGGCCCTCCT 

3451 TTCCTATAAG TTCATATTTT GCTTTGAGCC AGCTTTTTAG TCTCATTCCC 

3501 ACACATGTGG AAGCCACGTT GCCTCTCGAC CCCCTGAGGC CCTTAAGTAC 

3551 ATCGCTTTCT GGTGGTGCCC AGGAGGCTGC TGCTGGGCCG CTGGGTCTCT 

3601 CTTTGTGGAC TTGTACCTGG AGCAGGAGGA ACTCCAGTCC GTCCCGGCAT 

3651 CCATGGCAGC CCGCGGTTAG GTGCGCCAGG GTTTGCTGAT GTTGTCTTGT 

3701 GCTGTTCCAC TCTTGGCTCC AGCAGACCCA CTGTCCCAGA AAAGCCTGAT 

3751 CCTGTAGTTT ATGTAGAATG CCACATCTGC GTCCTCAAGA CCTGTTTCAT 

3801 CCATTTGGGA AAAGATCTTG GGAAAGGCCA CTTTGCTCGC AGGGGTGAGG 

3851 GGAAGGATAG AGAATCTATT TTTAATAAAT AACATTCTAG AATGAAAAAA 

3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3951 AAAAAAAAAA AAAAAA 

BLAST Results 



Ho BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF Iran 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP CTP A (284-2921 



1 MHRKKVDNRI RILIEBGVAE RQRSLFWVG DRCKDQVVIL HKMLSKATVK 

51 ARPSVLWCYK KELGFSSHRK KRMRQLQKKI KNGTLNIKQD DPFELFIAAT 

101 NIRYCYYNET HKILGBTFGH CVLQDFF-ALT PNLLARTVET VEGGGLWIL 

151 LRTMHSLKQL YTVTHDVHSR YRTEAHQDW GRFNERFILS LASCKKCLVI 

201 DDQLNILPIS SHVATMEALP PQTPDESLGP SDLELRELKE SLQDTQPVGV 

251 LVDCCKTLDQ AKAVLKFIEG ISEKTLRSTV ALTAARGRGK SAALGLAIAG 

301 AVAFGYSNIF VTSPSPDNLH TLrEFVFKGF DALQYQEHLD YEIIQSLHPE 

351 FWKAVIBVNV FREHRQTIQY IHPADAVKLG OAELWIDEA AAIPLPLVKS 

401 LLGPYLVFKA STINGYEGTG RSL5LKLIQQ LRQQSAQSQV STTAEN1CTTT 

451 TAR LAS ART L HEVSLQESIR YAPGDAVEKW LNDLLCLDCL NITRIVSGCP 

501 LPEACELYYV NRDTLFCYHK ASEVFLQRLM ALYVASHYKN SPNDLQHLSD 

551 APAHHLFCLL PPVPPTQHAL PEVLAVIQVC LEGEISRQSI LNSLSRGKKA 

601 SGDLI PWTVS EQFQDPDFGG LSGGRWfUA VHPDYQGKGY GSBALQLLQM 

651 YYEGRrPCLE EKVLETPGEI HTVSSEAVSL LEEVITPRKD LFPLLLXLNE 

701 RPAERLDYLG VSYGLTPRLL KFWKRAGFVP VYLRQTPNDL TGEHSCIMLK 

751 TLTDEDEADQ GGWLAAFWKD FRRRFLALLS YQFSTFSPSL ALNIIQNRNM 

801 GKFAQPALSR EELEALFLPY DLKRLEMY5R UMVDYHLIMD MIPAISRIYF 

851 LNQLGDLALS AAOSALLLGI GLQHKSVDQL EKEIELPSGQ LKGLFNRIIR 

901 KWKLFKEVQ EKAIEEQNVA AKOWKEPTM KTLSDDLDEA AKEFTJEKHKK 

951 EVGKLKSKDL SEY1IRG0DE EWNEVLNKAG PNASIISLKS DKKRXLEAKQ 

1001 EPKQSKKLKH RETKNKKDKX LKRKK 

BLASTP hits 

No BLASTP hits available 
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Alert B LAS TP hits for DKF2phte»3_6cl 1 , frame 3 

TREMBL : CEA F3 1 30 4 gene: T55A12 . B"; C«enorh*bditis elevens cosmid 
F55A12., M - 1, "Score - 2782, P - l.le-289 

PIR: 355151 probable ntmbrane protein YNL132* - yeast (Sscchsromyce* 
cerevisi*e), N - 2, Score - 2549, P - 3.5e-273 

SWISSPROTiYXXl AC HAM HYPOTHETICAL PROTEIN ( FRAGMENT ) . , H - 1, Score - 
1013, P - 3.2e r 102 

S«S$PROT:YDK9 SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IV 
CHROHOSOKE I.,~N - 1, Score - 2843, P - 3.Be-296 



>SWISSPROT;YDK9 SCHPO HYPOTHETICAL 116.5 KD PROTEI M C2OC8.09C IN CHRCHOSOME 
I. 

Length - 1,033 

HSPs: 

Score - 2843 {426.6 bits), Expect - 3.8e-296, P - 3.8e-296 
Identities - 576/1033 (5S»>, Positives - 750/1033 (72%) 

Query; 1 MH RK K VO M RI R I L I ENG V A ERQRS L FV V VG D RGKDQ W I LHHMLS KATVKARPSV LWC Y K 60 

H +K + D*RI LI+NG ■ E+QRS FVVVGOR +DQVV LH +LS + + V ARP+VLW YK 
Sbjct: 1 M P KKAL OS R I PT L I KNGCQ EK QRS F FW VG D RARDQVVN LH WLLSQS K V AAR P N V LWHY K 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFCLFIAATNIRYCYYNETHKILGNTFG 119 

K+L GF*SHRKKR +4+K+IK G + ♦DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKHtKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPKLLARTVETVEGGGLWILLRTMNSLROLyTVTMDVHSRYRTEAHQDV 179 

M VLQDFEALTPKLLART*ETVEGGC+VV+LL +nslkqlyt*+md*hsryrteah dv 
Sbjct: 121 MLVLQDFEALTPNLLARTIETVEGGGIWLLLHKLHSLKQLYTMSHDIHSRYRTEAHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIOOQLHILPISSHVATMEALPPQTPDESLGPSDLELRELK 239 

RFKERFILSL *C* CLVIDD+LN+LPIS ++ALPP «•♦♦ + ++EL+ 

Sbjct: 181 TARFNERFI ls LGNCEKCLVI DDELNVLPISGG- KNVKALPPTLEEDN" STQNSI KELQ 237 

Query: 240 eslqdtqpvgvlvoccktldqakavlkfiegisektlrstvaltaabgrgksaalglaia 299 

ESL + PC LV KTLDQA*AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAIA 
Sbjct: 23B ES LGEDH PAGALVGVT KTLDQARAVLTFVES I VEKSLKCTVSLT AG RGRGKSAALGLAIA 297 

Query: 300 GAVAFGYSNIFVTSPSPDHLHTLFEFVFKGFDALQYQEHLDYEIIQSLMPCFKKAVIRVN 359 

A+A GYSNIF+TSPSP+NL TLFEF+FKGFDAL Y+EH+DYtl IQS NP ♦ + A++RVN 
Sbjct: 298 AAIAHGYSNI FITSPSPENLKTLrEFIFKGFDALNYEEHVDYDI IQSTNPAYHNAIVRVH 357 

Query: 360 VFREH RQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 

+FR+HRQTIQYI P D+ LGQAELVVIDEAAAIPLPLV+ L+GPYLVFMASTINGYEGT 
Sbjct: 358 I FRDHRQT IQY I S PEDSHVLGQAELVVI DEAAA I PLFLVRXUGP Y LVFMAST I NG YEGT 417 

Query: 420 GRSLSLKLI QQLRQQS AQSQV STT AEJJ KTTTT A RLA S ARTLH EVS LQ ES I R Y A PG DA V EK 479 

GR5 LS LXL +OOLR *QS S ♦ NK+ + ♦ + S RTL E*SL E IRYA GD *E 

Sbjct: 418 GRSLSLKLLQQLREQSRI— YSGSGNNKSDSQSHI-SGRTLKEISLDEPIRYAMGDRIEL 474 

Query: 480 WLNDLLCLOCLN-ITRI VS-GCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASH 537 

WLN LLCLO ♦ +*R+ + G P P C LY V+RDTLF YH SE FLQR*H+LYVASH 
Sbjct: 475 WIN KL LC L DAA S YVS RKATQG FPH PS ECS L Y RVS RDTLFS YH P I S EArLQRHMS L Y V AS H 534 

Query: 538 YKNSPNDLQKLSDAPAKHLFCLLPPVPPTQHALPEVLAVIOVCLEGEISROSILNSLSRG 597 

YKHSPRDLQ++SOAPAH Lr LLPPV LP+ + VIQ+ LEG ISR+SI+BSLSRG 

Sbjct: 535 YKNS PNDLQLMSDAPAHQL FVLLPPVDLKN PKLPDF I CVIQLALEGS I SRES I KNSLSRG 594 

Query: 598 KRA5GDLI PWTVSEQFQDPDFGGLSGGRWRI AVHPDYQGMG YGS RALQLLQHY YEGRFP 657 

♦♦A GDLIPW +S+QrQD +F L G R+VRIAV P++ MGYG+RA+QLL Y+EG*F 
Sbjct: 595 QRAGG 0 L I PW L I SQQ FQDEH F AALGGA RIVRIAVSPEH VKMG YGT RAMQL LHEYFCGKFI 654 

Query: 658 C LE E K VLET PQE I HTVS S EA V 5 LLEE V I T PR- - KDL P P LLLK LKER P AERLDY LGVS 712 

E+ *+E+ +LEIRK +PPLLLKL+E E L Y+GVS 

Sbjct: 655 SASEEFKAVKHS LKR I GDEE I ENTALQTEKI HVRDAKTMPPLLLKL5 ELOPE PLH YVGVS 714 

Query: 713 YGLT PRLLKFWKRAGFV PVYLRQTPN DLTGEHSC I MLKTLTOEDEADQGGWLAAFHKDFR 772 

YGLTP L KFWKR G* P*YLRQT NDLTGEH+C+ML+ L 0 WL AF ++F 

Sbjct: 715 YGLTPSLQKFH KREG Y C P L Y L ROT AN DLTGE H TC VMLR VL EG RD S E WLGAFAQNFY 770 

Query: 773 RRFLALLSYQFSTFSPSLALHIIQNRNNGKP AQPAL5REELEALFLPYDLKRLEMY B2B 

RRFL+LL YQF F+ AL+ + + KG + L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RR FLS L LG YQFREFAA I T A LS V L DACNNGT K Y W JJSTS KLT N E E I N N V FES Y DLK R LES Y 830 
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Query: B29 SRNMV DYH L I MOM I P AI S RI Y FLNQLGD- LALS AAQS AL L LG I GLQH KS V DQLE K E I EL P 887 

S K++DYH+I+D++P ++ +YF + 0 ♦ L3 Q ++LL +GLQ+K++D LEKE LP 
Sbjct: 831 SNNLLDYHVIVDLLPKLAHLYFSCKFPDSVKLSPVQQSVLLALGLQYKTIDTLEKEFHLP 890 

Query: 8B8 SGQLMGLFNRI IRXWKLFNEVQEKAUEQHVAAKDWME PTMKTLSDDI.DE 939 

S QL+ + ** ♦£++ K IEE+* + K P ++L ++L E 

Sbjct: 891 5WQLLAML V K L5 K K I HKC 1 0 £ I ETK D I EEE LG5NKKT ESS N S K L PE FT P LQQS LEEELQE 950 

Query: 940 AAKEFQ- EKHKKEVGKLKSHDLSEY 1 1 RGDDEEKNEVLNKAGPNAS IIS LKS DKKRKLEA 998 

A E +K+ + ++DL + Y IRG++E+K KA H I R ♦ 

Sbjct: 951 GADEAMLALREKQRELINAIDLEKYAIRGNEEDW KAA EN - QI QKTNGKGA RW S I 1004 

Query: 999 KQEPKQSKKL- - KNRETKNKKDMKLKRKK 1025 

K E +++ L +++TK K K K +K 
Sbjct: 1005 KGEKRK N N S LDA S D KKT K EK PS S KKK FRK 1033 



Pedant information lor DKFZphtes3_6cll, frame 3 



Report (or DKnphtea3_6cll. 3 



1025 

115104.57 
0.50 

PIR:S55151 probable membrane protein YNH32* - yeast (Saccharomyces cerevisiae) 

10.99 other a lqnal- transduction activities (S. cerevisiaa, YNL132w] 0.0 
r general function prediction (H. influeniae, HI1254) 2e-05 

ATP GTP A 1 
ROD" ~1 
Alpha Beta 

LOW_COMPLEXITY 11.80 I 



S EQ KHRKK VDNRI R I LI ENGV AERQRS LFVWGDRGKDQW r LHHMLSKATVKARPSVLWC YK 

SEG 

PRD cccccccchhhhhhcccccccceeeeeeeeccceeeeeeehhhhhhhhhhccceeehhhh 

SEO KELGFSSHRKKRMRQLQKKIKNGTLNIKODDPFELFIAATNIRYCYYKETHKILGNTFGM 

SEG 

PRD hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

SEQ CVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDW 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhtihhhhh 

S EQ GRFNERFI LS LASCKKCL V I DD0LN I L P I S S H V ATMEAL P PQT P DE5 LG PS DLELRE LK E 

SEG 

PRO hhhhhhhhhhhcccceeeaeecceeeecccccccccccccccccccccccchhhhhhhhh 

SEQ SLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIAG 

SEG xxxxxxxxx 

PRO hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchhhhhhhhhh 

SEQ AVAFGYSN1 FVTS PSPDNLHTLFE FVFKGFDALQYQEHLDYE I 1 QSLNPE FNKA VI RVNV 

SEG xxx 

PRD hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhhe««eeccccccceeeeeeh 

SEQ FREHRQTIQYIHPADAVKLGQAELWIDEAAAIPLPLVKSLLGPYLVFKASTINGYEGTG 

SEG 

PRD hhhhhhhee«eccccccccccc«eeehhhhhccchhhlihhhccce««eee«ccccccccc 

SEQ RSLSLKLI QQLRQQS AQSQVSTT A EH KTTTT ARLAS ART LHEV SLOES I RYAPGDAVEKW 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 

SEQ LWDLLCLXLNITRIVSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKK 

SEG xxxxxxxxxxx 

PRO hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 

SEQ SPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 

SEG 

PRD cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 

S EQ SGDLI PWTVS EQFQD PO FGG LSGG RWR I A V H P D YQGMG YGS RA LQ LLQMY Y EG RFPC IX 

SEG 

PRD cccchhhhhhhhhhhccccccccc ee eee eccccccccccchhhhhhhhhhhhcccchh h 

SEQ EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVSYGLTPRLL 



[LENGTH] 

(KM) 

Ipl) 

EHOHOL] 

0.0 

(FUNCATJ 

( FUNCAT ] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW) 
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SEQ KFWKBAGFVPVYLROTPNDLTGEHSCIMLKTLTDEDEADQCCWLAAFWKDFRRRriALtS 

SEG 

pro Mihhhcccceeeeecccccccwce«eee«ecccccccccchhhhhhhhhhhhhhhhhhhh 

SEQ YQFSTFSPSLALNIIQNRNMGKPAOPALSREELEALrLPYDLKRLEMYSRNMVDYHLIMD 

SEG 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhhhhhhhccchhhhhhhh 

SEO MIPAISRIYrLHQLGDLALSAAQSALLLGIGLQHKSVDQLEKEIELPSGQUlGLFNRIIR 

SEG XXXXXXXXXXXXXXXXXXXXX 

PRD hhhhhtihhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 

SEQ KWKLOIEVQEKAIEEQMVAAKDVVMEPTMltTLSDOLDEAAKEFQEKHKKEVGKLKSHOL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ SEYI I RCDDEEHNCVLNKAGPN AS IISLKS DKXRXLEAKQE P HQS K K L KNRETKN KKDMK 

SEG xxxxxxxxxxxxxxx 

PRD eceeececchhhhhhhhhccccc««e*Mcechhhhhhhhcccccccccccccccchhhh 

SCO LKRKK 

SEG xxxxx 

PRD hhccc 



Prosit* for DKFZphte»3_6cl 1 . 3 



(Ho Pf*m data available loi DXFZpht*s3_6cll .3) 
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DKFZphtes3_6dl6 



group: testes derived 

DKFZphtesS 6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC:H_DJ11BSI07.2. 

The cONA is different to the proposed gene model: it contains additional exons. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testla-specif lc 
genes. 

WUGSC:H_DJ1 185101 ,2, differences to qenmodel 

differences to genmodel of WUGSC:H_Ojl 1B5I07 .2 two exons skippt. 

Sequenced by BHFZ 

Locus; /mop--7qll.23-q21" 

Insert length: 4572 bp 

Poly A stretch at pos. 4540, polyadenylation signal at pos. 4520 

1 GGCGGCGCTA CCTTCGGACT CTCCCGCGCG CACCTCAGCC GCCTCCTAGC 
51 GGCGCGGCGC TCCCTCCTAC GCCTAAAATG ACCAATGTGT GATTTCAGTG 

101 GAATAAATGG CGTCCAAAGT CACAGATGCT ATAGTCTGGT ATCAAAAGAA 

151 GATTGGACCA TATGATCAAC AAATATGGGA AAAATCTGTT GAACAGAGAG 

201 AAATCAACGG GCTAAGGAAT AAACCAAAGA AAACAGCACA TGTGAAACCA 

251 GACCTCATAG ATGTTGATCT TGTAAGAGGG TCTGCATTTG CAAAGGCAAA 

301 GCCTGAAACT CCTTGGACTT CTCTGACCAG AAAGGGAATT GTTCGAGTTG 

351 TATTTTTCCC CTTTTTCTTC CCGTGGTGGT TACAACTAAC ATCAAAGGTC 

401 ATCTTTTTCT GGCTTCTTGT CCTTTATCTT CTTCAAGTTG CTGCAATAGT 

451 ATTATTCTGC TCCACTTCTA CCCCACACAG CATACCTCTG ACAGAGGTGA 

501 TTGGGCCGAT ATGGCTGATG CTGCTCCTGG GAACTGTGCA TTGCCAGATT 

551 GTTTCCACAA GAACACCCAA ACCTCCTCTA AGTACAGGGG GTAAAAGAAG 

601 AAGGAAATTA ACAAAAGCAG CCCATTTGGA AGTACATAGG GAAGGAGATG 

651 GTTCTAGTAC CACAGATAAC ACACAAGAGG CAGCACTTCA GAACCACGGT 

701 ACAACCACCT CTCACAGCGT TGGCACTGTC TTCAGAGATC TCTGGCATGC 

751 TGCTTTCTTT TTATCAGGAT CAAAGAAAGC AAAGAATTCA ATTGATAAAT 

801 CAACTGAAAC TGACAATGGC TATGTATCCC TTGATGGGAA GAAGACTGTT 

851 AAAAGCGGTG AAGATGGAAT ACAAAACCAT GAACCTCAGT GTGAAACTAT 

901 TCGACCAGAA GAGACAGCCT GGAACACAGG AACACTGAGG AATGGTCCTA 

951 GCAAAGATAC CCAAAGGACA ATAACAAATG TCTCTGATGA AGTCTCCACT 
1001 GAGGAAGGTC CTGAAACAGG ATACTCATTA CGTCGTCATG TGGACAGGAC 
1051 TTCTGAAGGT GTTCTTCGGA ATAGAAAGTC ACACCATTAT AAGAAACATT 
1101 ACCCTAATGA GGACGCCCCT AAATCGGGTA CTAGTTGCAG CTCTCGCTGT 
1151 TCAAGTTCCA GACAGGATTC TCAGAGTGCA AGGCCAGAAT CTGAAACAGA 
1201 AGATGTGTTA TGGGAAGACT TGTTACATTG TGCAGAATGC CATTCATCTT 
1251 GTACCAGTGA GACAGATGTG GAAAATCATC ACATTAATCC ATGTGTGAAA 
1301 AAAGAATATA GAGATGACCC TTTTCATCAG AGTCATTTGC CCTGGCTCCA 
1351 TAGTTCCCAC CCAGGATTAG AAAAAATAAG TGCTATAGTA TGGGAAGGTA 
1401 ATGATTGTAA GAAAGCAGAC ATGTCTGTAC TTGAAATCAG TGGAATGATA 
1451 ATGAACAGAG TGAACAGCCA TATACCAGGA ATAGGATACC AGATTTTTGG 
1501 AAATGCAGTC TCTCTCATAC TGGGTTTAAC TCCATTTGTT TTCCGACTTT 
1551 CTCAAGCTAC AGACTTGGAA CAACTCACAG CACATTCTGC TTCAGAACTT 
1601 TATGTGATTC CATTTGGTTC TAATGAAGAT GTCATAGTTC TTTCTATGCT 
1651 TATAATAAGT TTTGTGGTTC GCGTGTCTCT TGTGTGGATT TTCTTTTTTT 
1701 TGCTCTGTGT AGCAGAAAGA ACTTATAAAC ACCCATTACT TTTTGCAAAA 
1751 CTCTTTGGAC ATTTAACATC TGCAAGGAGG GCTCGAAAAT CTGAGGTTCC 
1B01 TCATTTCCGG TTGAAGAAAG TACAGAATAT AAAAATGTGG CTATCTCTCC 
1B51 GTTCCTATCT TAAGCGTCGA GGTCCTCAGC GATCAGTTGA TGTAATAGTT 
1901 TCATCTGCTT TCTTATTGAC TATCTCAGTT GTATTTATCT GTTGTGCCCA 
1951 GATAAACCTC TACTTGAAAA TGGAGAAAAA ACCTAACAAA AAGGACGAAC 
2001 TGACACTAGT GAATAATGTT TTAAAACTGG CTACTAAACT GCTAAAGGAG 
2051 TTGGACAGTC CTTTTAGATT ATATGGGCTT ACAATGAATC CGCTGCTTTA 
2101 TAACATCACC CAGGTTGTTA TCCTGTCAGC TGTTTCTGGT GTTATCAGTG 
2151 ACTTGCTTGG ATTTAATTTA AAGCTATGGA AGATTAAGTC ATGACAATTC 
2201 AAAGAAAAGA AGATGTAGCC TCTTTTCCAG AATAAGAGTA CTGACTAAGC 
2251 TGCCTGAAAG CTTCTCACTG ATTCTTTGCT TCAGGAGTCT CAGCTAGGGA 
2301 GTTGAAGTGT TTACATCAGA CTCTCTTGTG CAATTCTTAT ATTTATTTTA 
2351 CTGGTTCACT TTTTTTTACA TTTATTTTAG TCTTTATATT TTTATTTTTA 
2401 AGCATTGATG TACTTAGTTG TTGAAAGGGT GATGAAACTG ATATCCAGAT 
2451 ACTTGAGATC CTGGTAATTG GTCATAAATA ATTGGCAAAA TAACAAATTG 
2501 TGAAAATAGA AGCCATTGCT CAGCACCGTT TCTCCATCAA TGCCGTCAAC 
2551 TTGCCTTACT TGAGGAAAAA TTCTTTAACT TTGGAATATT GCATTGAACT 
2601 CAGCTATACA CATAAAACAT TTTCTTTGGT AAATCAAGAT CCAGTCAGGG 
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2651 TTTCTCTTCA ATTATTTTGG AACAATCCCA GGATCCAAAC TGATTAAGTT 
2701 ACAGTTTAAC CACCCTTCAG TATTAATATA TACCCTATTA TATAACAGGT 
2751 CAACAAGTGC TCTTTGATGA TAAAACTTGT AATAGAGCAA TAATTCTAAA 
2B01 TGGTTACCAT ACTGTAAGAT ATTTTGATAA AAATTAACTA GTAATACTTG 
2951 TATTTATTTG AAACACTGGG CTGTTTGCAC AGCTCCAACT GTGCATGCTC 
2901 AAAATGTGCA CTTTTTAAAA TTGTTACTTT TAATGCGTAT CTTTATATGG 
2951 GATCTGTTAT AGTATACTAG GGCATGATAT GGTATCCTTT TCAGTGAGGT 
3001 ATATACTCAT CTCACAAGTG AAGTGCCTAC TGATATTACT AAAGTACATT 
3051 ATGTTTACTC AAGTAAATAA TTTTCTCCCC ATGGTACACT CTACTGTAGG 
3101 CTATTCATAC CACACTGAAA TGAACAACTG AA6AATAAGG CTAAGAACCA 
3151 ATAAAATATT TCTCTAATTG CTACTTGTAA AACTGTATCC AAATTTTCAG 
3201 AAAAGACAGC TTCAGCTTGC AAATTCTATC CTCTAAACTT ATCTGGTGCA 
3251 TTCTCCCCAC CCCACCCCCA TTATATAAGG GCTATTTTAG ATGCTTTTAA 
3301 CCTCCCCAAC AAATAATTTG CCAAGTGTCC AATGAGAACT TATCATGTTG 
3351 GTGTGTTAGG TAAATCGGGC AAATATGATA GTGTCTTACA TTGGGCCTTG 
3401 ATTTTAAGTT CTTATATTTG TACAATCGAG TATTTTAGAA ATTACATGAA 
34 51 ACATGAAACA GTTTTTGCAA TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 
3501 AAATTTATTT GAAACAATCT AGAATTTTCT TGGTGCAAAG TGTATCATCT 
3551 GGAATATCCT CATATTTTTA CCAIATTTTA AGAACTTTAA GACGATTAAT 
3601 TGTAAATAAT TTATTTGATT KGTGCAGTTC TAATCCCTAA ATCATAATCT 
3651 TAAAATCAGG AATGTGTGGA GAACAGAGCC ATGTCATATC ACTTTGCTCT 
3701 TACCATTCCT TTTGATCAGC CTCAATTCAG CCTCATTGTG TAGTATGTTT 
3751 TTTCTTTCTA TGAAAAACAA CAGAAAGCAT TTCATTTTAT TTGCCTATGT 
3801 TCAAATATGT TTAATAATGA CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 
3851 TGTAATACTG GAGCTTTAAG AACATACTTA GTTTCTCATG TGAAAACTTA 
3901 GGCTTTGTCT GATGTTTTTC CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 
3951 TTTAGGAATT ATGTTTTATA AACTTTTTCA ATATAAGGTA CATGCCTATA 
4001 CAGAACTTAA CATTTTGCAC AGAATATATC AAATATATTT TGAGAAAAAA 
4051 AGTACGGCAT GAGTTCTGTT AGGAATAAAA GATGAAACTA TTGTATCTCA 
4101 CAAAAAATCT TATTTCAGAA TGGAAATATT TTTGAGAAAA GTAGCTGAGT 
4151 ATACTGGTTT AAGAAAATGC TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 
4201 GGAGTTGATT TATTAACTAC AGTATACCTC TCAACAGTTT ATAAATAATA 
4 251 TGTTGAATTA TGTCAGTGTG CGCAGCAGTA GAATACTAAA ACGAAAATGT 
4 301 CATGTTAAGC AATTTCAGAA CATTAACTGA ACTATTTTCA AAGCAGAAAA 
4 351 ATTGACATTG CTGCCTTTAA GAATACCATG AATGTAAGAA ATTGAAAGAA 
4401 ATTGTAAAAT ATCACATAAT ATAGAAATGG CAGTTCAAAG AGAATTGTGG 
44 51 CAGATGTTGT GTGTGAACTG TTGTTTCTTT CCCACATGTG TTGTATTTGA 
4 501 AAGTTTTACA CTAAGTTTAA AATAAAACAT TCTGTGACTG AAAAAAAAAA 
4551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



NO BLAST result 



Medline entiles 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 2191 bp; peptide length: 695 

Category: known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROME^ 1375-381) 



1 MASKVTDAIV 

51 IDVDLVRGSA 

101 FWLLVLYLLQ 

151 TRTPKPPLST 

201 TSHSVGTVFR 

251 GEDGIQNHEP 

301 CPETGYSLRR 

351 SRQDSESARP 

401 YRDDPFHQSH 

451 RVNSHIPGIG 

501 IAFGSNEDVI 

551 GHLTSARRAR 

601 AFLLTISVVF 

651 SPFRLYCLTM 



WYQKKIGAYD 
FAKAXPESPW 
VAAIVLrCST 
GCKRRRKLRK 
DLWHAAfFLS 
OCCTIRPEET 
HVDRTSEGVL 
ESETEDVLHE 
LPMLHSSHPG 
YOI FGNAVSL 
VLSMVIISrV 
KSEVPH FRLK 
ICCAQINLYL 
NPLLYNITQV 



OQIWEKSVEQ 
TSLTRKGIVR 
SSPHSIPLTE 
AAHLEVHREG 
GSKKAKWSIO 
AWWTGTLRNG 

rnrk5hhykk 
dllhcaechs 
lekisaivwe 
i lgltpfvfr 
vrvslvwiff 
kvqnikmhl3 
kmexkpnkke 
vilsavscvi 



REIKGLRNKP 
VVFFPFFFRW 
VIGPIWLMLL 
DGSSTTDNTQ 
KSTETONGYV 
PSKOTQRTIT 
HYPNEDAPKS 
SCTSETDVEN 
GNDCKKADMS 
LSQATOLEQL 
rLLCVAERTY 
LRSYLKRRGP 
ELTLVNSVLK 
SOLLGFNLKL 



KKTAHVKPDL 
WLQVTSKVir 
LCTVHCOIVS 
EGAVQNHGTS 
SLDGKKTVKS 
NVSDEVSSEE 
GTSCSSRCSS 
HQINPCVKKE 
VLEISGMIKN 
TAHSASELYV 
KQRLLFAKLF 
QRSVDVIVSS 
LATKLLKEL0 
WKIKS 



BLA5TP hits 
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No BLAST P hits available 

Alert BLASTP hit* for DltFZphtes3_6dl6, tram* 2 

PIR:S38170 SRP40 protein - yeast tSaccharomyces eerevisiae), M - 1. 
Score - 100, P - 0,08 

TREM8L:AC004 990 1 gene: "WUGSC:H DJ11B5I07 . 2"; Homo sapiens PAC clone 

DJ1185I07 from 7qll.23-q21. complete sequence., N - 2, Score - 2693, P 
- 0 



>TREMBL:AC004990 1 gene: "WUGSCjH DJU85I07 .2"; Homo sapiens PAC clone 
DJ118SI07 from 7qll.23-q21, complete sequence. 
Length - 588 

HSPs: 



Score - 2693 (404.1 bits). Expect - 0.0e+00, Sua P(2) - 0.0e+00 
Identities - 510/515 <99»>, Positives - 512/515 199*) 



Query: 


35 


CLRNKPKKTAH VKPDLI DVDLVRGSAF AKA1CPES PWTS LTRKGI VRVV FFPFFFRWWLQV 


94 




CLRNKPKKTAH VK PDLI DVDLVRGSA FAKAK PES PWTS LT RKGIVRW FFPFFFRWWLQV 




Sbjct: 


1 


GLRNKPKKTAHVK PDLI DVDLVRGSAFAKAKPE5PWTSLTRKG I VRW FFPFFFRWWLQV 


60 


Query: 


95 


TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSI PLTEVIGPIWLMLLLGTVHCQIVSTRTP 


154 






TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSI PLTEVIGPIWLMLLLGTVHCQIVSTRTP 




Sbjct: 


61 


TSKVI FFWLLVL YLLQVAAI VLFCSTSS PHS I PLTEVI G PIWLMLLLCTVHCQ I VSTRTP 


120 


Query: 


155 


K P PL5TGG K RRRKLRKAAH L EVH REG DGS S TT 0HTQEGA VQNH GTST SH SVGT V FROLWK 


214 




KPPLSTGGKRRRKL RKAAK LEVHREGDGSSTT DNTQEGA VQNH CTS T SH SVGT V FRD LWH 




Sbjct: 


121 


K P P L5TG-G K RRRK L RKAAH LE VHREG OG S 5 TT DNT QEGA VQHH GTSTSHSVGTVFRDLWH 


180 


Ouery: 


215 


AAFFL5GSKKAKNSI DKSTETONGYVSLDGKKTVKSGEOG IQHHEPQCETI RPEETAWNT 


274 






AAFFLSGSKKAKMSIDKSTETDKGVVSLDGKKTVKSGEDGIQSHEPQCETIRPEETAWNT 




Sbjct: 


1B1 


AAFFLSGSKKAKNS I DKSTETDKGYVSLDGKKTVKSGEDG IQHHEPQCETI RPEETAWNT 


240 


Query: 


275 


GTLRNGPSKDTQRTITfrVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 


334 




GT L RNG P S KDTQRT I TNVS 0 E VSS EEC P ETC Y S LRRH V DRT 5 EG VLRNRKS HH YKKH Y P N 




Sbjct: 


241 


GTLRNGPSKDTQRT I TNVS DEVS SEEGPETCYS LRRHVDRTSEGVLRNRKSHH YKKH Y PN 


300 


Ouery: 


335 


E DAP KSGTSC S S RC S 5 S RQ DS ES ARP ES ET ED V LW ED L LHCAEC H S SCT S ET DVEHHQ I N 


394 




E DAP KSGTSC5 5 RC S S S RQOS ES ARP ES ET EDVLW ED L LHCAEC KSSCTSETDVENHQIN 




Sbjct: 


301 


EDAPKSGTSCSSRCSSSRQMESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 


360 


Query: 


395 


PCVKKEYRDDPFHQSHLPWLHSSHPGLEKISAIVWEGNDCKKADMSVLEISGMINNRVNS 


454 




PCVKKEYRDDPFHQSHLPWLHSSH PGLEK I SAI VWEGNDCKKADMSVLEISGM IMNRVNS 




Sbjct: 


361 


PC V K KE YRDD P FHQS H L PWLH SS H PGLE K I S A I VW EGN DC KKA DMS VLE I SGK INNRVN5 


4 20 


Query: 


455 


HIPGIGYQI FGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 


S14 




HI PGIGYQI FG»A VSL I LGLT PFVFRLSQATDLEQLTAHS AS ELYVI AFG5NEDV I VLSM 




Sbjct: 


421 


HI PGI GYQI FGNAVSLI LGLT PFVFRLSQATDLEQLTAHS AS ELYVI AFGSHEOV I VLSM 


4B0 


Query: 


S15 


V 1 1 S FWR V S L VH I FrFLLCV AERT YKQRLL FAR L 549 








VI I SFVVRVSLVW I FFFLLCVAERTYKQ L+ K+ 




Sbjct: 


481 


VIISFVVRVSLVWIFFFLLCVAERTYKQINLYLKM 51S 




Score 


- 409 


(61.4 bits). Expect - 0.0e+00, Sum P(2) - 0.0e+00 




Identities - 92/115 (80%), Positives - 98/115 (B5») 




Query: 


595 


DVIVSS AFLLT I SWn CCA QI NLYLKMEKKPNKKEELT LVNHVLK 


640 




DVIV S +F++ iC+V+I C A Q I NL Y LKMEK K PNKKEELT LVNHVLK 




Sbjct: 


474 


DVI VLSHVI I SFVVRVSLVW I FFFLLCVAERTYKQ I NLYLKHEKKPNKKEELTLVNNVLK 


533 



Query: 641 LATKLLKELDS PFRL YGLTWNPLLYNITQVVILS AVSGVI SDLLGFNLKLWKIKS 695 

LAT KLLKELDS PFRL YGLTKNPLLYN ITQW1 LS AVSGVI SDLLCFNLKLWK I KS 
Sbjct: 534 LATKLLKELDS PFRL YGLTKNPLLYN ITQW I LS AVSGVI SDLLCFNLKLWK IKS 588 



Pedant information for DKFZphtes3_6dl6, frame 2 



Report for DKFZphtea3_6dl6 . 2 



695 

78466.68 
9.30 

TREHBL:AC004 990_1 gene: "WUCSC :H_DJUB5I07 . 2'; Homo sapiena PAC clone DJH85I07 
23-q21, complete sequence. 0.0 
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i PROSITE) CYTOCHROME^ 1 

[KM] TRANSMEMBRANE 6 

[KM] LOWJTOMPLEXITY 5.32 I 

S EQ HASKVTOAI VWYQKK ICAYDOQIWEKS VEQREI KGLRNKPKKTAH VK PDLI DVDLVRGSA 

SEC 

PRD ccceeeeehhhhhhhcccchhhhhhhhhhhhhhhcccccccccccccccce«e*»ccch 

HEM 

SEQ FAKAK P ES PWTS LTRKG I VR W F FP F F FRWWLQVT SKVIFFWLLVLY LLQVAA I V LFCST 

SEC xxxxxxxxxxx 

PRD hhhhcccccccccccccce««e«cchhhhhhhtihhhhhhhhhhhhhhhhhhhhhhe«ecc 

mem kmkmhmmmhkhmhmwottwmhmmmkhmhh 

seq ssphsipltevigpiwlmlllgtvhcq:vstrtpkpplstgckrrrklrkaahlevhrec 

SEC xxxxxxxx 

PRO ccccccc«««tehhhhhhhhhhhhh«««eeeccccccccccchhhhhhhhhhhhh«eecc 



HEM 



SEQ DC S STTOWTQ EGAVQNKGT ST SHSVGT V FROLMHAAF FL5GS KKAKM SI0K5TETDHGYV 

SEC 

PRO cccceccccce«e»MCCceccccchhhhhhhhhhhhhhcccchhhhhcccceeeccccc 

MEM 

SEQ SLDCKKTVKSGEDCIQNHEPQCETIRPEETAHMTGTI.RNGPSKDTQRTITNVSDEVSSEE 

SEG 

PRO cccccceeecccccccccccccccccccc«ee«ccccccccccccceie«cccccccccc 

MEM 

SEQ GPETGYSLRRHVDRTSEGVLRM AKSHH YKKHYPKEDA PKSGTSCSSRCSSS RQDS ESARP 

SEG xxxxxxxxxxxxxxxxxx. . . 

PRD ccccc«e««eeccccccchhhhhhcccccccccccccccccccccccccccccccccccc 

MEM 

SEQ ES E T E D VL*J£ DLLKC AEC H SSCTSET D VENHQ I N PC VKKEYRDDP FH Q5 H L PW LH S S H PG 

SEG 

pro cccchhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccccccccccc 

MEM 

SEQ LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQIFGNAVSLILGLTPFVFR 

SEG 

PRD cccc««eee«cccccccce e*««hhhhhhhhhccccccccccccccccc«««cccccchh 

KEM HMMMMMMHMMKMHMMMKMMHKHNMKMMMM 



SEQ LS QAT DLEQLT AH S AS EL YVI AFG S N E DV I VLSMV IIS FWRVSL VW IFFFLLCVAERTY 

SEG 

PRD hhhhhhhhhhhhcccce«eeeeeccccce«eehhhhhhhhcchhhhhhhhhhhhhhhhhh 



SEQ KQRLLFAKL FCHLTSARRARKSEVPHFRLKKVQN I KMWLS LRS YLKRRCPQRS V DV I VSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccc««««t«hhhhhhhhhhhhccccce««e«ee« 

MEM MMMMMMM 

SEQ AFLLT I SW FICCAQI NL YLKMEKKPKKKEELTLVNMVLKLATKLLKELDSPFRL YGLTM 

SEC 

PRD «eee««e*«««eeehhhhhhhhhhcccchhhhhhhhhfihhhhhhhhhhcccccee«eccc 

MEM MMMMMKMMKMMMMMMMMKM 

S EQ NPLLYNI TQW I LSAVSGV I S DLLGFNLKLNKI K3 

SEG 

PRD cchhhhheee«««eeecchhhhhccceeeeeeccc 



Proaite for DKFZpht«»3_6dl«.2 
375->3Bl CYTOCHROME^ PDOC00169 



[No Pfam data available for DKFZphtaa3_6dl6.2) 
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DKFZphtes3_72kll 



group: testes derived 

DKFZphtesS 72kll encodes * novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine tippers and a microbodies C-terminal targeting signal 
K-LI signature. This sequence is responsible for transport of proteins from free polysomes 
into the microbodies. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to S.pombe hypothetical repeat -containing protein 

complete cDNA, complete cds, 6 EST hits 13 from testis derived 
librarys) 

Sequenced by dkfz 

Locus : unknown 

Insert length: 1134 bp 

Poly A stretch at pos. 1124. polyadenylation signal at pos. 1088 



1 AACCTTTCAA GTCCCCCCTC CTTTCCTTAA AGTCTTTTAT AGGGGTCCCC 
51 TTCTTGCCCA TCTCCATCCT GTGACTCACG ACTGAAAGGG CACAGACAGG 
101 TCACTGCCAG CATTGTTGGG GCAAGCCTGC AACCACGCAT CACTGGGGAT 
151 CTGACATGAC AATGGCCGCC TGCCCCCTCT GAGGGCTACA GGACTTACCC 
201 CAGTGGGAAG CAGCTAAGCA GGTCTGACCA GCCGACCTGG ACCTGGCCAA 
251 GGGTCCTGTC ATCCCTCATG GCCACCCCGC CATTCCGGCT GATAAGGAAG 
301 ATGTTTTCCT TCAAGGTGAG CAGATGCATG GGGCTTGCCT GCTTCCGGTC 
351 CCTGGCGGCA TCCTCTCCCA GTATTCGCCA GAAGAAACTA ATGCACAAGC 
401 TGCAGGAGGA AAAGGCTTTT CGCGAAGAGA TGAAAATTTT TCGTGAAAAA 
451 ATAGAGGACT TCAGGGAAGA GATCTGGACT TTCCGAGGCA AGATCCATGC 
501 TTTCCGGGGC CAGATCCTGG GTTTTTGGGA AGAGGAGAGA CCTTTCTGGG 
551 AAGAGGAGAA AACCTTCTGG AAAGACGAAA AATCCTTCTG GGAAATGGAA 
601 AAGTCTTTCA GGGAGGAAGA GAAAACTTTC TGGAAAAAGT ACCGCACTTT 
651 CTGGAAGGAG GATAAGGCCT TCTGGAAAGA GGACAATGCC TTATGGGAAA 
701 GAGACCGGAA CCTTCTTCAG GAGGACAAGG CCCTGTGGGA GGAAGAAAAG 
751 GCCCTGTGGG TAGAGGAAAG AGCCCTCCTT GAGGGGGAGA AAGCCCTGTG 
801 GGAAGATAAA ACGTCCCTCT GGGAGGAAGA GAATGCCCTC TGGGAGGAAG 
851 AGAGGGCCTT CTGGATGGAG AACAATGGCC ACGTTGCCGG AGAGCAGATC 
901 CTCGAAGATG GGCCCCACAA CGCCAACAGA GGGCAGCGCT TGCTGGCCTT 
951 CTCCCGAGGC AGGGCGTAGC CAGCATGCAG GTGCAGGGCC CTGTGGTCCA 
1001 GACTCCCCTG GGTTGGGATT CAAGTCCAGG GTGACCCCAT GTGCTGGAGA 
1051 AAATACACAC TCATTGGTCT CCTTGCTTTG AAAGATCCAA TAAAGTCCTG 
1101 AGGCAAGGTT TGGAAAACCA ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES CTER (231-234) 
LEUCINEZIPPER (142-164) 
LEUCINE ZIPPER (149-171) 
LEUCINE~ZIPPER (156-178) 
LEUCINE ZIPPER (163-185) 
LEUCINE~ZIPPER (170-192) 
LEUCINElziPPER (170-192) 
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1 MATPPFRLIR KHFSFKVSRW MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 
51 FREEMKI FRE KIEDrREEMW TFRGKIHA FR GQI LGFWEEE RPrwEEEKTF 
101 WKEEKSFWEM EKSFREEEKT FHKKYRT FWK EDKAFWKEDN ALWERORHLL 
151 QEDKALWEEE KAMVEERAL LEGEKALNED KTSLWEEENA LWEEERAFWW 
201 ENNGHVAGEQ NLEDGPHNAN RGQRLLAFSR GRA 

BLAST P hits 

Entry SPCC330 4 froia database TREHBLMEW: 

gent: -SPCC330.0<c"; product: "hypothetical repeat-containing proteir 
S.poobe chromosome III cosmid c330. 

score - H9, P - l.«e-08, identities - 55/187, positives - B8/1B7 

Entry A45973 from database PIR: 
trichohyalin - human 

score - 147, P - 3.0e-07, identities - 57/194, positives - 94/194 



Alert BLAST P hits tor DKFZphta»3_72kll, Erarae 1 
No Alert BLASTP hits found 

Pedant information for DKrZpht«s3_72kll, frame 1 



Report for DKFZphtea3_72kll . 1 



t LENGTH 1 

(KW] 

IpH 

IPROSITE) 

I PROS I TE) 

[PROSITE) 

[PROSITE) 

{ PROSITE J 

IKV] 

(KW| 



233 

28752.65 
5.70 

LEUCINE ZIPPER 5 
MICROBODIES CTER 
HYRISTYL ~ 1 
CK2 PHOSPHO SITE 
PKC"PH0SPH0"SITE 
All~Alpha 
LOW - COMPLEXITY 



MATPPFRLI RKMFS FKVSRWMGLAC FRSLAAS5 PS I ROKKLMHKLQEEKAFREEHKI FRE 
cccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 
KI EDFREEMWT FRGKI HAFRGQI LGFWEEERPFWEEEKT FWKEEKSFWEKEKS FREEEKT 

XXXXXXXXXXXXXXXXXXXXXXXX 

hhhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 



PRO hhhhcccccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTS LW EEEN A LH E EERA FWHENNGH V AGEQML EOC P H N AN RGQR LL A FS RG RA 

3EG . . .xxxxxxxxxxxk 

PRO ccchhhhhhhhhhhhhhhhhhccccchhhhhhccececcccchhhhhhhhccc 



Prosite for DKFZphtes3_72kll.l 



PS00005 
PSOOOOS 
PSOOOOS 
PS00005 
PSOOOOS 

psoooos 

PSOOOOS 

psooooe 

P500342 
PS00O29 
PSQ0029 
PS00029 
PS0O029 
PS0OO29 



14->17 
35->38 
71->74 

ii3->n6 

106->110 
113->U7 
183->187 
ei->67 
231->234 
142-H64 
149->171 
156->178 
163->185 
170->192 



FKC PHOSPHO SITE 

PKC~PHOSPHO - SITE 

PKC~PHOSPHO~SITE 

PKC~PHOSPHO~SITE ■ 

CK2"PHOSPHO - SITE 

CK2~PHOSPHO~SITE 

CK2~PHOSPHO~SITE 

MYRTsTiL 

MICROBODIES CTER 

LEUCINE_ZIPPER 

LEUCIHE_ZIPPER 

LEUCINE~ZIPPER 

LEUCINE_ZIPPER 

LEUCINE~ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00Z99 
PDOC00029 
PDOC00029 
POOC00029 
PDOC00029 
PDOC00029 



(No PEan data available for DKF2phtss3_72ltll . 1 ) 
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DKFZphtes3_72klS 



group: cell structure end motility 

DKFZphtes3 72kl5 encodes a novel 168 amino acid protein with strong similarity to Rattus 
norvegicus~actin-fila«ent binding protein Frabin. 

rGOl-related F- act in -binding protein (Farbin/rGDl } Is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible tor faciogenitel dysplasia or Aarskog-Scott syndrome. 
Frabin binds r-actin and sho-..s F-actin-cross-linking activity. Cverexpression of frabin in 
Swiss 3T3 cells and C0S7 cells induces cell shape change and c-Jun N-terrainal kinase 
activation, as described for FGD1 . Because FCD1 has been shown to serve as a CDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actin cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polar i led growth and to mitogen-ectivated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 



strong similarity to ac tin-filament binding protein Frabin 

2 EST hits 

Sequenced by DKFZ 

LOCUS : unknown 

Insert length: 1B45 bp 

Poly A stretch at pos. 1835, polyadenylation signal at pos. IB 16 



1 GTGATGCAGA GTGCTCTTAT GATAGATGAA TCTAGGAAAG CCTCTTTGGA 

51 GATGTGATAC CTGAACAGAA CCCCGAATGA TAAGAAGAAA TACCAGTGTT 

101 TTAGGAGAGA TTGTCCTAAG CAGAGAACAG CAGCTGCAAA GACCCCAAGA 

151 CACATACACT TGGTTATTAA GAATGCGAGC AGCAAGGACT ATGGCAACAA 

201 CACAGTGAGT TTTCCCTTGA GTCTGTGAGG AAGCCCTCAG AGTTTGTGAC 

251 TGACTTGTAG AGGTTCTAGT GGAGGGGATC AGAGTGGAAA CAAAGAGACC 

301 AGTTAAAAAG GTATGGCAGC ATGAATAAAA AAGTTTTGAG AGTATTCATT 

351 ATGCCTTCCA AATAAAAAAC TCTTTGGTTC ATAATTTGTT CATAAATTAA 

401 GGACTGGCTA CACTCTACTA TTTAAAAATG TTAAGAAACA TCAATAAGTA 

451 AAAATGTTAG GAAGAGATGA TAAATACCTA AGTATTATAT CTAACTAAGT 

501 CTTTACTAAC TAGTCACATT ATTAAACACT GCAAGGATCA AGAAAAGTTA 

551 AGCGTTGAAA AATAAATAAA TAAGTTATAA ATAAAATAAA CAGCCCAAGG 

601 AAATGTTCCA CTCCCCATAG GTAGACTCGG GGTCATCTTC TTTATTTAAA 

651. TCTTTATTTA AATGTGGATA GCATCCCAAG AGACTTGGCT CTACACTAAG 

701 AATATTCAAA TCCATGTTTC TGAAACCATC AGAGATAGAA AAAAAAAGTA 

751 GCGAATATCC CTTTTCAACT GGAATAAACT TGTCTTAATT CTAGAACTTT 

801 TCCATACCAA TGTTTTCATG CTTCCTTTGT ATTTTATCTT TTAGCTCATT 

851 ATCAAATTAT AGTGATTTGA AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 

901 CTAGAACCCC AGGAAGGCAT GGATTGACAA CCACACCTCA ACAAAAACTC 

951 CTCTCCCAGC ACTTGCCACA GAGGCAGGGA AATCATACAG ATAAGACTCA 

1001 GGGTGCACAG ACTTGTGTGG CCAACGGTGT AATGGCAGCA CAAAACCAGA 

1051 TGGAATGTGA GGAGGAGAAA GCTGCCACTC TTAGCTCAGA TACTTCTATT 

1101 CAAGCTTCTG AACCCTTGCT 'iGATACGCAC ATAGTGAATG GAGAAAGAGA 

1151 TGAAACTGCC ACAGCTCCTG CATCACCCAC AACAGATAGC TGTGATGGAA 

1201 ATGCTTCTGA CAGTAGCTAC AGGACTCCAG GCATAGGCCC AGTGCTCCCC 

1251 CTAGAAGAAA GAGGGGCAGA AACAGAAACC AAGGTACAAG AGAGGGAAAA 

1301 TGGGGAAAGC CCTCTGGAAC TGGAGCAGCT GGACCAGCAC CATGAGATGA 

1351 AGGTAGAGCA TGAGACTAGC TCATCAGCAG GGAAAACCCT CCCTATTCGA 

1401 TTGTTGTCTT AAAACTCTTT ATTTATTGCA CCCCTGAAAT GTATGAATCA 

1451 GATCACCCAC ACTGGCAGTT AAACGATTTT CAAGCTCTGG CTGCTGATTA 

1501 CCATTTCCCC TATGCTCTAA GCAGATATTT CACTTTTTCT TTTCATGTAG 

1551 TTTCTGTTAA TATCTCTGTT GTAATTTCAG CAGTCAGAAC ACTGTGGAAA 

1601 CTTTAATATA CGAAATCCAC AAATGTATTG TTTTTACATA GAAAGAAAAT 

1651 CTTCCTTGTT GCTCTAGATG TTGGTGCTGT ATCCCTAATA CTTACGGGCC 

1701 AAGCAAGAAG AAATTGTATA ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 

1751 TAAATAGGCC TGTAAGATGA ACTTGCCACT AGTAAATGTT ACTTTTAAGG 

1B01 ACATGAATAT GGAAGTATTA AATTATTCAA CAGATAAAAA AAAAA 



BLAST Results 



NO BLAST result 
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Medline entries 

98334590: 

Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing call shape 

and activating c-Jon N- terminal kinase. 



Peptide information foe frame 3 

ORF from BIO bp to 1373 bp; peptide length: 18B 
Category: similarity to known protein 
Classification: Cell atructure/motility 

1 MFSCFLCILS FSSLSHYSDL KKESAVNLMA PRTPGRHGLT TTPQQKLLSQ 
SI HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT L5SOT5IQA5 
101 EPLLOTHIVH GERDETATAP ASPTTDSCDG HASDSSYRTP GIGPVLPLEE 
151 RGAETETKVQ EREKGESPLE LEQLDQHHEM KVEHETSS 



BLA5TP hit* 

Ho BLAST P hits available 

Alert BLAST P hits foe DiCFZphtea3_72kl5, frame 3 

TREMBL: AF03B3BB 1 product: "actin- filament binding protein Frabin"; 
Rattus norvegicus actin-f i lament binding protein Frabin mRNA, complete 
cds . , H - 1, Score - 428, P - l.Be-39 

>TREK8L: AF03B388_1 product: "actin-fllanent binding protein Frabin"; Rattus 
norvegicus actin- filament binding protein Frabin nRNA, complete cds. 
Length - 766 



Query; 12 S S LS N Y SDLK K E S A VN LN A P RT PG RHGLTTT PQQKL L S OH LPQRQGNDT D KTQGAQTC V A 11 

S LS*Y+D++K+S +NLN P+TP +HGLT tT QKL 3 PQ+Q D+D* QG C+A 
Sbjct: 31 S V LS S YTDVQK DSTMKLN I PQT P RQHG LT STT PQK L PS H K S PQKQEKDS DQNQGQHGC LA 90 

Query; 72 NGVKAAQNQHEC EEEK AAT LS S DTS I QAS EP LL DT H I VNG ERDET AT A P AS PTT DSC DGN 131 

NGV AAQ+QMECE EK A LS +T Q ♦ D H+ + HG R+ET T AS T+3 D N 
Sbjct: 91 NGVAAAQ5QHECETEKEAAL5 PETDTQT AAAS PDAH VLNGVRNETTTDSASS VTNSH DEN 110 

Query: 132 ASDSS¥RTPGIGPVLPLEERGAETETKVQERE»GESPL£LEQLDQHH£NKVEHE IBS 

A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

Sbjct: 151 ACDS SCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 

Pedant information for DKFZphtei3_72kl5, frame 3 

Report for OKFZphtes3_72kl5.3 



f MM ] 20388.32 
[pi] 4.62 

(HOMOL) TREMSL: AF03838B 1 product: "actin-f i lament binding protein Frabin") 

norvegicus actin-f i lament binding protein Frabin nRNA, complete cds. 2e-38 

(KM) All Alpha 

IKW| SIGNAL PEPTIDE 16 

[KWJ LOW_COMPLEXITY 12.77 i 

SEQ MFSC FLC I LS rSSLSNYSDLKKESAVNLWAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

SEG . xxxjcxkxxxxxxxx 

PRD ccchhhhhccccccccccccececccccccccccc cceccceeehhhhhhhceecccccc 

S EQ D KTQGAQTCV ANC VMAAQN QMEC EE EKAATLS 5 DT S I QAS EPLLDTH I VNGERDET ATAP 
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SCO KXXXX 

PRD ccccccc«««cchhhhhhhhhhhhhhhhhhhcccccee«cccccceeeeecccccccccc 

S£0 ASPTTDSCDCHASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQtOQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEO KVEHETSS 

SEC 

PRD hhhhhccc 



(No Prosit* data avaiUbl* for DKFZpht»sJ_l2klS . 3) 
(Ho Pfam data available foe DKFZpftt«»3_72kl5. 3) 
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DKFZpntes3_72pl6 



group: intracellular transport and trafflcing 

DKFZphtes3 72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mua 
euisculus maternal -embryonic 3 (Mem3l gen*. 

Hem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
p reimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Hem3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar earboxypeptidase Y (CPY), proteinase A <PrA>, 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments. 



strong similarity to mouse MEM3 and yeast VPS35 
Sequenced by DKfZ 
Locus: /aap-"16pl3.3* 
Insert length: 2707 bp 

Poly A stretch at pos. 2697, no polyadenylation signal found 

1 CTACGCGCGC GCCGCCTCCT CCTTGCrGCA GGCTCTGGGG AGTCGCCATG 
51 CCTACAACAC ACCAGTCCCC TCAGGATGAG CACCAAAACC TCTTGGATGA 
101 AGCCATACAG GCTCTGAAGG TCCAGTCATT CCAAATGAAG AGATGCCTGC 
151 ACAAAAACAA GCTTATCGAT TCTCTAAAAC ATGCTTCTAA TATGCTTGGT 
201 GAACTCCGGA CTTCTATGTT ATCACCAAAG AGTTACTATC AACTTTATAT 
251 GGCCATTTCT GATCAACTGC ACTACTTGGA GGTCTACCTG ACAGATGAGT 
301 TTGCTAAAGG AAGGAAAGTG GCAGATCTCT ACGAACTTGT ACAGTATGCT 
351 GGAAACATTA TCCCAACGCT TTACCTTTTG ATCACAGTTG GAGTTGTATA 
401 TGTCAAGTCA TTTCCTCAGT CCAGCAAGGA TATTTTGAAA GATTTGGTAG 
451 AAATGTGCCG TGCTGTGCAA CATCCCTTGA GCGGTCTGTT TCTTCGAAAT 
501 TACCTTCTTC AGTGTACCAG AAATATCTTA CCTGATGAAG GAGAGCCAAC 
551 AGATGAAGAA ACAACTGGTG ACATCAGTGA TTCCATGGAT TTTGTACTGC 
601 TCAACTTTGC AGAAATGAAC AAGCTCTGGG TGCGAATGCA GCATCAGGGA 
651 CATAGCCGAG ATAGAGAAAA AAGAGAACGA GAAAGACAAG AACTGAGAAT 
701 TTTACTGGGA ACAAATTTGG TGCGCCTCAG TCAGTTGGAA GGTGTAAATG 
751 TGGAACGTTA CAAACAGATT GTTTTGACTG GCATATTGGA GCAAGTTGTA 
B01 AACTGTAGGG ATGCTTTGGC TCAAGAATAT CTCATGGAGT GTATTATTCA 
651 GCTTTTCCCT GATGAATTTC ACCTCCAGAC TTTGAATCCT TTTCTTCGGC 
901 CCTGTGCTGA GTTACACCAG AATGTAAATG TGAAGAACAT AATCATTGCT 
951 TTAATTGATA GATTAGCTTT ATTTGCTCAC CGTGAAGATG GACCTGGAAT 
1001 CCCAGCGGAT ATTAAACTTT TTGATATATT TTCACAGCAG GTCGCTACAG 
1051 TGATACAGTC TAGACAAGAC ATGCCTTCAG AGGATGTTCT ATCTTTACAA 
1101 GTCTCTCTGA TTAATCTTGC CATGAAATGT TACCCTGATC GTGTGGACTA 
1151 TGTTGATAAA GTTCTAGAAA CAACAGTGGA GATATTCAAT AAGCTCAACC 
1201 TTGAACATAT TGCTACCAGT AGTGCAGTTT CAAAGGAACT CACCAGACTT 
1251 TTGAAAATAC CAGTTGACAC TTACAACAAT ATTTTAACAG TCTTCAAATT 
1301 AAAACATTTT CACCCACTCT TTGAGTACTT TGACTACGAG TCCAGAAAGA 
1351 GCATGAGTTG TTATGTGCTT AGTAATGTTC TCCATTATAA CACAGAAATT 
1401 GTCTCTCAAG ACCAGGTGGA TTCCATAATG AATTTGGTAT CCACGTTGAT 
1451 TCAAGATCAG CCAGATCAAC CTGTAGAAGA CCCTGATCCA GAAGATTTTG 
1501 CTGATCAGCA GAGCCTTGTG GCCCGCTTCA TTCATCTGCT CCGCTCTGAG 
1551 GACCCTGACC AGCAGTACTT GATTTTGAAC ACAGCACGAA AACATTTTGG 
1601 AGCTGGTGGA AATCAGCGGA TTCGCTTCAC ACTGCCACCT TTGGTATTTG 
1651 CAGCTTACCA GCTGGCTTTT CGATATAAAG AGAATTCTAA AGTGGATGAC 
1701 AAATGGGAAA AGAAATGCCA CAAGATTTTT TCATTTGCCC ACCAGACTAT 
1751 CAGTCCTTTG ATCAAAGCAG AGCTGGCAGA ATTGCCCTTA AGACTTTTTC 
1801 TTCAAGGAGC ACTAGCTGCT GGGGAAATTG GTTTTGAAAA TCATGAGACA 
1851 CTCGCATATG AATTCATCTC CCAGCCATTT TCTCTGTATG AAGATGAAAT 
1901 CAGCGATTCC AAAGCACAGC TAGCTGCCAT CACCTTGATC ATTGGCACTT 
1951 TTGAAAGGAT GAAGTCCTTC AGTGAAGAGA ATCATGAACC TCTGAGGACT 
2001 CAGTGTGCCC TTGCTGCATC CAAACTTCTA AAGAAACCTG ATCAGGGCCG 
2051 AGCTGTGAGC ACCTGTGCAC ATCTCTTCTG GTCTGGCAGA AACACGGACA 
2101 AAAATGGGCA GGAGCTTCAC GGAGGCAAGA GGGTAATGGA GTGCCTAAAA 
2151 AAAGCTCTAA AAATAGCAAA TCAGTGCATG GACCCCTCTC TACAAGTGCA 
2201 GCTTTTTATA GAAATTCTGA ACAGATATAT CTATTTTTAT GAAAAGGAAA 
2251 ATGATGCGGT AACAATTCAG GTTTTAAACC AGCTTATCCA AAAGATTCGA 
2301 GAAGACCTCC CGAATCTTGA ATCCAGTGAA GAAACAGAGC AGATTAACAA 
2351 ACATTTTCAT AACACACTCG AGCATTTGCG CTTGCGGCGG GAATCACCAG 
2401 AATCCGAGGG GCCAATTTAT GAAGGTCTCA TCCTTTAAAA AGGAAATAGC 
2451 TCACCATACT CCTTTCCATG TACATCCAGT GACGGTTTTA TTACGCTAGG 
2501 TTTCCCTTCC ATAGATTGTG CCTTTCAGAA ATGCTGAGCT AGGTTTCCCA 
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2551 TTTCTTACCT GTCATCTCTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 

2601 GGACCTTAAT AAAATTATTC ACTTCCTAAG TGTTCAAGTC TTTCTGATCA 

2651 CCCCAACTAG CATGACTGAT CTGCAATTTA AAATTCCTCT GATCTGTAAA 
2701 AAAAAAA 



BLAST Results 



Entry AC00722S flora database EKBLHEW: 

Homo sapiens chromosome 16 clone 4BOG7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 
Score - 1081, F * 2.6e-217, identities • 219/221 
13 exons 

Entry H5015146 [com database EHBL: 
human STS 1(1-6846. 
Score - 2033, P - 2.9«-87, identities - 425/436 



Medline entries 



96327632: 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mera3. 

97256867: 

End o some to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, require* the function of the 
VPS29, VP530, and VPS 3 5 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant nissorts and 
secretes only a subset of vacuolar hydrolases. 

10196044: 

Distinct Domains within Vps35p Mediate the Retrieval of Two Different 
Cargo Proteins frora the Veast 

Prevacuolar/Endosomal Compartment 



Peptide information for frame 3 



ORF from 48 bp to 2435 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 



1 MPTTQQSPQD EQEKLLDEAI QAVKVQSFQM KRCLDKNKLH DSLKHASNML 

51 CELRTSMLSP ICS YYELYMAI SDELHYLEVY LTDEFAKGRK VADLYELVQY 

101 AGNI IPRLYL L1TVGWYVK SFPQSRKDIL KDLVEHCRGV QHPLRGLFLR 

1S1 NYLLQCTRNt LPDEGEPTDE ETTGDISDSM DFVLLNFAEH HKLMVRHOHO 

201 GHSROREKRE RERQELRILV GTWLVPASQL EGVNVERYKQ IVLTGILEQV 

251 VNCRDALAOE YLMECIIQVF PDEFHLQTLH PFLRACAELH QMVNVKNIII 

301 ALIORLALFA HREDGPGIPA DIKLFDI F5Q QVATVIQSRQ DMPSEDWSL 

351 QVSLINLAHK CYPDRVDYVD KVLETTVEIF NKLNLEHIAT SSAVSKELTR 

401 LLK1PVDTYN KILTVLKLKH FHPLFEYFDY ESRKSMSCYV LSMVLDYNTE 

451 IVSQDQVDSI MNLVSTLIQD QPDQPVEDPD PEDFADEQSL VGRFIHLLRS 

501 EDPDQQYLIL HTARKHFGAG CHQRIRFTLP PLVFAAYQLA FRYKENSKV0 

551 DKWEKKCQKI FSFAHQTISA L1KAELAELP LRLFLOGALA AGEIGFEHHE 

601 TVAYEFKSQA FSLYEDEISD 5KAQLAAITL I IGTFERMKC FSEENHEPLR 

651 TQCALAASKL LKKPDOGRAV STCAHLFWSG RMT0KNGEEL HGGKRVKECL 

701 KKALKIANQC MDPSLQVQLF IEILNRYIYF YEKENOAVTI QVLNQLIQKI 

751 REDLPHLESS EETEQINKHF HNTLEHLR1R RESPESEGPI YEGLIL 

BLASTP hits 

Ho BLASTP hits available 

Alert BLASTP hits for DKrzphtes3_72pl6, frame 3 

TREMBL : AF0 24 504 3 gene: "A TM017A05.7"; Arabidopsis thaliana BAC 
TH017A05., N - 1. Score - ?27, P - 1.9e-162 
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PIR:SS6936 vacuolar protein- sorting protein VPS35 - yeast 
ISaccharomyces cerevisiae), N - 3, score - 626. P - 1.5«-116 



■ 1, Score - 3376, P 



TREMBL:S42186_1 gene: "VPS3S"; product: -Vps35p"; VPS35-v»cuolar 
protein sorting {Saccharomyces cerevisiae-yeast, Genomic, 3790 nt],'l 
3, Score - 813, 9 - 4.4e-U5 

>TREMBL:HH47024_1 gene: -Mem3"; product: "MEH3"; Kus musculus 
maternal -embryonic 3 (Mera3) mRNA, complete cds. 
Length • 754 



Query: 78 EVYLTDEFAKGRKVADLYELVQYAGHI I PRLYLLITVGWYVKSFPQSRKDILKDLVEMC 137 

♦VYLTDEFAKC + + A DL Y EL VQ Y +GN 1 1 P RL Y LL I T VG W YV K S FPQS RKDI LK DLV EMC 
Sbjct: 34 K V YLT DEFAKG ERLA DL YELVQ Y5GN I I P RL YLL I I VG WYVKS FPQS RKD I LK DLV EMC 93 

Query: 138 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGOISDSMDFVLLKFAEMilKLtfVRM 197 

RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLHFAEKNKLWVBM 
Sbjct: 94 RGVQH PLRGL FLRN Y LLQCT Rtt I L P DEC E PTDEETTGD I S DSMD FV LLN FA EHH K LK V RM 153 

Query: 198 QHQGHSRDREKREREROELRILVGTNLVRL5QLEG-VNVERYKQIVLTGILEQVVNCPJJA 256 

QHQGHSRDREKREREROELRILVGTNLV Lt ♦ +Q I VLTG I LEQWNC RDA 

Sbjct: 154 QHQGH SRD REKRE RERQE LRI LVGTN L V A LTLV SWRC KCGT LOO I V LTG I L EQV VNC R D A 213 

Ouery: 257 LAQE Y LMEC 1 1 QV FPO E Ft) LQTLN P FL RAC AELHQN VNV KM 1 1 1 A L 1 D RLALFAHRE DCP 316 

LAQE KECIIQVFPOEFHLQTLNPFLRACAELHQNVHVKKI1 IALIDRLALFAHRE P 
Sbjct: 214 LAQE I SKECI IQVFPDE FHLOTLH PFIRACAELHQMVHVKK 1 1 1 ALI DRLALFAHREMEP 27 3 

Query: 317 GIPADIKLFDIFSQOVATVIQSRODMPSECWSLQVSLIMLAMKCYPCRVDYVDKVLETT 376 

GI PA*+KLFDI FSQQVATV I QSR+ DMPSEOVVSLQVSLI N LAMKC X PDRVDYVDKVLETT 
Sbjct: 274 G I P AE LKL FDI FSQQV AT V I QS RRDMP 5 EDVVS LQ VS L I H LAMKC YPDRVDYVDKVL ETT 333 

Query: 377 VEIFHKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESR— K 434 

VEIFNXLHLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYES K 
Sbjct: 334 VEI FNKLHLEH I ATSSAVSKELTRLLKI PVDT YNNI LT VLKLKHFH PLFEY FDYESS PGK 393 

Query: 4 35 SMSCYVLSHVLDYHTEIVSQDQVOSIMHLVSTLIQDQPDQPVEDPDPEOFADEQSLVGRF 494 

SMSC YVL5NVL D YNT E I VSQDQVDS IMNLVSTLI QOQ PDQPVED POPED FA DEQSLVGRF 
Sbjct: 394 SMSCYVLSMVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPOPEDFADEQSLVGRF 4 53 

Query: 4 95 I HLL RS ED P DOQ Y LILNTARKH FGAGGN QR I R FTL P P L V F AA YQLA FR Y K ENS K V DDKWE 554 

I HLLR5 +DPDQQYL I LNT ARKH FGAGGN QRIRFTLPPLV FAA YQLA FRYKENSK + 
Sbjct: 454 I HLL RS DD P DOQ YL I LMT ARKH FGAGGNQR I RFTL P PL V F AA YQLAFR YK ENS KWMT S GK 513 

Query: 555 KKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 614 

+ F HOT I SAL I KAELAE L P LRL FLQG ALAAGE I G FEN H ET V A Y E FMSQ A T S L Y 

Sbjct: 514 RNARRYFHLPHQTISALI KAELAEL P LRL FLOGALAAG E I G FENH ET V A Y E FMSQA FS L Y 573 

Query: 615 E DE I S DS KAQLAA I T L 1 1 GT FERMKC FS E EM H E P LRTQC ALAASKLLKK POQG RA V STC A 67* 

EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRT+CALAASKLLKKPDQ C 
Sbjct: 57* EDEISDSKAQLAAITLIIGTFERMKCFSEEMHEPLRTECALAASKLLKKPDQAEREHMCT 633 

Query: 675 H LFttSG RNT DKNG EE LHGG K R VMEC LKK ALK I ANQCMD P S LQVQL F I E I LN R Y I Y FY E K E 734 

L WSGRNTDKNG EELHGGK RVMEC LKKALK I ANQCKD P S LQ VQl r I E I LNRYI YFYEKE 
Sbjct: 634 S L- ttSG RNT DKNG EELHGGKRVMEC LKKALK I ANQCMD PS LQVQL FIEILNRYIYFYEKE 692 

Query: 735 NDAVT I QVLNQL I QKI REDLPNLESSEETEQI NKHFHNTLEH LRLRRES FESEGP I YEGL 794 

NOAVT I QVLNQL I QK I REDLPNLESSEETEQINKHFKNTLEHLR RRESPESEGPI YEGL 
Sbjct: 693 K DA VTI QVLNOL I QK I REDLPNLESSEETEQINKHFHNTLEHLRT RRESPESEGPI YEGL 752 

Query: 795 1L 796 
IL 

Sbjct: 753 IL 754 

Pedant information for OKFZphtes3_72pl6, frame 3 



Report for DKFZphtea3 < _72pl6. 3 
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91723.67 


tpt) 


5.32 


[KOHOL] 


TREMBL:HM47024_1 gene: - M*aJ"; product: "MEM3"; kus nusculu* maternal-embryonic 


3 (Mem3> raRMA, 


complete cds. 0.0 


t FUNCAT | 


30.23 vacuolar and lysosomal organization (S. cerevisiae, YJL154cJ le-110 


t FUNCAT ) 


08.13 vacuolar transport (S. cerevisiae, YJL1S4C] lt-110 


[FUNCAT J 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YJL154c] 


1«-U0 




[ FUN CAT] 


30.22 endoaomal organization (S. cerevisiae, YJLlMc) le-110 


( FUN CAT] 


0B. 07 vesicular transport <golqi network, ate.) [5. cerevisiae, YJL154c) 


le-110 




(FUNCAT) 


30.08 organization of golgi {5. cerevisiae, YJLlS4c) le-110 


( FUNCAT ) 


09.07 biogantsis of endoplaamatic reticulum (S. cerevisiae, YJL1S4C) le-UO 


(BLOCKS) 


BL01092Q 


IPIRKW) 


yeast vacuole le-108 


[PIRKW) 


membrane protein le-108 


tM] 


TRANSMEMBRANE 1 




LOMCOMPLEXITY 5.40 » 



SEQ M PTTQQS PQDEQE KL LDEA I QA VKVQS FQMKRC l> D K NKLMDS LKH A S NM LGE LRT SM LS P 

SEG 

PRO cccccccccehhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEW 

SEQ KSYYELYKAISDELHYLEVYLTDEFAKGRKVAOLYELV0YAGNI IPRLYLLITVCWYVK 

SEG 

PRO cceeeaehhhtihhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceceeeceeeee 

KEM MMMKHMMHMMKKMH 

SEQ S FPQS RKD I LK D LVEHC KG VQH P L RGL FLRN Y LLQCT RN I L P D EG E PT OEETTGD I S DSM 

SEG , , xxxxxxxxxxxxxx 

PRO ecccchhhhhhhhhhhhhhccccc hhhhhhhhhhhhhh he c cc c ccccccc cc cc cccc h 

MEN MMMMMMMMMM 

S EQ OrVLLNFAEKSK LHVRMQHQGKS RDREKREREROELRI LVGTHLVRLSQLEGVNVER YK0 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccccnhhhhhhhhhhhhhhhhccchhhhhhhhccchhhhhh 

MEM 

SEQ I VLTGI LEQWN C RDALAQE YLMEC IIQVFPD EFHLQTLN P FLRACAE LH QNVNVKN 1 1 1 

SEG 

PRO hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

HEM 

SEQ AX I DRLALFAHREDGPGIPAD I KLFDI FSQQVATV I QSRQOMPSE0WSLQVS LI NLAHK 

SEG 

PRD h h hh h hh h hh h hcccccccccch hh hh h hhhhhhhhhhc cccc cccc hhhhhh hhhhhhh 

MEM 

SEQ C Y PDRVOYVDKVLETTVEI FNKLNLEK I ATSSAVSKELTRLLK I PVDTYMN I LTVLKLKH 

SEG 

PRD cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

MEM 

S EQ FHPLFEYFDYES RKSMSC YVLS NVLDYNTEI VSQDQV DS I HNLVSTLI QDQPDQPVEDPD 

SEG xxxxxxxxxxxx 

PRD hhhheeecccchhbhhhhhhhhhccccceeehhhhhhhhhhhhhhhhhhccccccccccc 

MEM 

SEQ PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRI RFTI,PPLVFAAYQLA 

SEG xxx 

PRD ccccchh^lhhhhhhhhhhhccccchhhhhhhhhhnhhcccccceee*eccchhhhhhhh^ 

MEM 

SEQ FR Y K EN S K VDOKW E K KCQK I FSFAHQT I SALI KAELAELPLRL FLQGALAAGE I G FENH E 

SEG 

PRD hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^^hhhhhhhhhccccc 

MEM 

SEQ TV A YE FMSQAFS L YEDE I 5 DS KAQLAA I T L 1 1 GT FERKKC rs EENH E P LRTQCA LAAS KL 

SEG 

PRD eeeeehhhhhhhnhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

MEM 

SEQ LKK PDQGRA V ST C AHLrWS GRNT OKNG EELHGG K R VMEC L K KAL It I ANQCKD PS LQVQLF 

SEG 

prd hhcccceeee ecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 

KEM 

SEQ IEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQTNKH FHltTLEHLRLR 
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PRO hhhhhhhhnhhcccccteee«hhhhhhhhtihhhhhhhhccccchhhhhhhhhhhhhhhhh 

KEM 

SCO RES PES EGPI YEGLIL 

SEG 

PRO hhcccccccc«*«ccc 

KEM 



(Do Prosit* data available for DKFZphtts3_72pl6. 3) 
(Mo Pfan data availablt foe DKFZpht«i3_72pl6.3> 
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DKFZpht«s3_7b22 



group: cell structure and motility 

0KFZpht«a3_7b22 encodes * novel 143 amino acid protein with week similarity to paramyosina. 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate muscle. Paramyosina are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 



similarity to paramyosina 

complete cDNA, complete cds, few EST hits 

Sequenced by DMrZ 

Locus: /oap-'3* 

Insert length; 2291 bp 

Poly A stretch at pos. 2241, polyadenylation signal at pes . 2213 



1 GGAAGAAAGG CTAGCCGGCG TTCGCCCTAT GTGGCTGTCT TGAGCCAGTT 
51 TTTCACTTCT TTCATTTACC AAAGTGACAT GCACCTACTA GGTGCCAGGT 
101 GTTTAGACGT ACATACAACC CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 
151 TATGAAAAGT TTCCAGCCAA GAATTGCCAC TCCACCTGAG ATAAGGGGGA 
201 TCCTGGCCAT TAAGGAAACC TTGCCTTCGA AACTGAGCCG TGAGGAACTA 
251 TACAAAATGG GAAATTCCGA CAAATCCCAG TGGCTCATGA CACTAAGAAG 
301 TAAAATTACG AACTCACTGA GCTGGAAGTC ATTCAACGGG AATTGAATAG 
351 GTAACTGCAC TTTTGTGAGA TTATAAATAT ACCACGGAGC GTAACGAAGC 
401 TACAGAAGAA TGCAACAAGA CAGCCTCGAA GACTCAAACC TTCCTCCAAA 
4 51 AGTTTGGCAT TCTGAGATGA CGGTGTCAGT GACAGGCGAA CCACCTAGTA 
501 CCGTAGAACA AGAAGGAATA CCTAAAGAAA CAGACATAGA AATCATCCCA 
551 GAAATCCCGG AAACTCTAGA GCCACTGTCC CTTCCAGATG TGCTGAGGAT 
601 CTCCGCAGTT CTGGAGGACA CCACAGACCA GCTCTCTATT CTGAACTACA 
651 TCATGCCCGT TCAGTACGAA GGGAGACAGA GCATCTGCGT GAAAAGCAGA 
101 GAAATGAATC TAGAAGGAAC GAATCTAGAC AAACTTCCAA TGGCCTCAAC 
7 51 AATCACAAAA ATACCCAGTC CGTTAATAAC TGAGGAAGGA CCCAACTTGC 
B01 CACAAATCAG ACACAGAGGC CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 
951 GATCTTGTCT TCAAAAAACC TACAAGGCAG ACCATCATGA CTACGGAGAC 
901 ACTGAAGAAA ATTCAGATTG ATAGCCAGTT TTTCAGCGAT GTGATTGCAG 
951 ATACCATTAA GGAGTTGCAA CATTCGGCCA CTTACAACAG TCTCCTGCAA 
1001 GCTTTGAGCA AAGAGAGGGA AAACAAAATG CATTTCTATG ACATCATTGC 
1051 CAGGGAGGAA AAAGGAAGAA AACAGATAAT ATCACTTCAA AAACAGCTAA 
1101 TTAATGTCAA AAAGGAATGG CAATTTGAAG TCCAGAGTCA GAATGAGTAT 
1131 ATTGCTAACC TCAAGGACCA ACTGCAAGAG ATCAAGGCAA AATCCAACTT 
1201 GGAGAATCGC TACATGAAAA CCAATACCGA GCTGCAGATT CCCCAGACCC 
1251 AGAAAAAGTG TAACAGAACA GAGGAACTCT TGGTGGAAGA GATTGAGAAA 
1301 CTCAGGATGA AAACCGAAGA AGAGGCCCGG ACTCATACAG AGATTGAAAT 
1351 GTTCCTTAGA AAGGAGCAGC AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 
1401 TGGAGAAATA CGATAAGGAC ACAGAAATGA AACAGAATGA ACTAAATGCT 
1451 CTCAAAGCCA CAAAGGCCAG TGACTTAGCA CACCTTCAAG ACCTGGCAAA 
1501 GATGATAAGA GAGTATGAAC AGGTCATCAT TGAAGATCCT ATAGAAAAGG 
1551 AGAGGAGCAA GAAGAAGGTA AAACAGGATC TCTTGGAATT AAAGAGCGTT 
1601 ATAAAGCTCC AGGCCTGGTG CCGAGGCACT ATCATACGCA GACAAATTGG 
1651 TGGTTTCAAG ATGCCTAAAG ACAAAGTTCA TAGCAAGGAT TCAAAAGGCA 
1T01 AAGGTAAAGG CAAGGATAAG AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 
1751 CTTTTGTCTT TTCTGCTGGT ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 
1B01 AAGAAACACC TGGTACCTCA AAGATGACTC ATCTACAGGT TGTTTCCTAT 
1B51 TGAGACTTTC CCAGCGAAGC CTGATTTCAC TTTGCCTGTT AATTTCACTC 
1901 TGCCTGTTAG CTCGGTTTTC AAACCCTGAT TTAGGATTAC ACCATTGACT 
1951 TAGGGCTTCC TCATACCTTG CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 
2001 GATTCATTCT TCTTGCTCTT TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 
2051 AGCTACAGTG ACGCATTGAA ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 
2101 GATTTTACCC AATTTGTCTG TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 
2151 TCAGTAGCAA TTACAATATG ATGTTATTAG CTGTCCAGCA TAATATATAC 
2201 ACAGCAAAGT TTTAATAAAT GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 
2251 AAAAAAAAAA AAAAAAAAAA i 



BLAST Results 



Entry G36731 from database EHBL: 
SHGC-52923 Human Horao sapiens STS cDNA. 
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Score - 2262, P - 1.3e-97, identities - 462/469 



Kedline entries 



Ho Medline tntry 



Peptide information for from 2 



ORF from 410 bp to 173B bp; peptide length: 4 43 
Category: similarity to known protein 



1 MEEDS LE DSN LPPKWHSEM TVSVTGEPPS TVEEEGIPKE TDIEIIPEIP 
51 ETLEPLSLPD VLRISAVLED TTDQL5ILNY IKPVQYEGRO SICVKSREMH 
101 LEGTNLDKLP MA5TITKIP9 PLITEEGPKL PEIRHRGRFA VEFNKMQDLV 
151 FKKPTRQTIH TTETLKKIQI ORQFFSDVIA DTIKELQDSA TYWSLLQALS 
201 KERENKMHFY DIIAREEKGR KQIISLQKOL INVKKEWQFE VQSQNEUAW 
251 LKDQLQENKA KSNLENRYMK TNTELQIAQT QKKCNRTEEL LVEEIEItLRM 
301 KTEEEARTHT CIENTLRKEQ QKLEERLEFW MEKYDKDTEM KQWELNALKA 
351 TKASDLAHLQ DLAKHIREYE QVIIEDRIEK ERSItKKVKQD LLELKSVI KL 
4 01 QAWWRGTHIR R-EIGGFKMPK DKVDSKDSKG KGKGKOKRRC KKK 

BLAST P hits 

Ho BLASTP hit* available 

Alert BLASTP hit* for DKFZphtes3_7b22, frame 2 
SKISSPROT:HYSP_BRimA PARAMYOSIW . , N - 1, Score - 158, P - 5.8e-0B 



St»ISSPRCT:KYSP_ONCVO PARAMYOSIN. , N - 1, Score - 157, P - 7 , 4e-08 



Query: 


142 


Sbjct: 


169 


Query: 


202 


Sbjct: 


226 


Query: 


258 


Sbjct: 


283 


Query: 


317 


Sbjct: 


341 


Query: 


375 


Sbjct: 


394 


Score • 


118 


Identities • 


Query: 


181 


Sbjct: 


218 



• LQ A N LL+ i 



KTE I O ♦ K * L EE+E LR K 



+K Q K ♦ RL+ ♦£ 



181 DTIKELQDSATYKSLLQ— 
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Query: 236 EWQFE — VQSQN E Y - 1 AN LKOQLQ EM KAK SN LEN R YMK TNT E - LQ I AQTQK KCNRT EE LL 291 

E ♦++ E+ +A ++ + K+K * E E L+ QK+ E+ + 

Sbjct: 278 ALOEESAARAEAEHKLALANTEITOWKSRrDAEVALHHEEVEDLRKKMWlKQAEYEEQIE 337 

Query: 292 V EE I E KLRMKT EEEARTHT E I EMF LRK EQQK LE - - E RL £ FWMEK Y DKDT EMKQN E LN 346 

+ ++K+ + ++R +E+E+ L K 0 + ER + *EK ♦ +++ +EL 
Sbjct: 33B I H- LQK I SQLEKAK5 RLQS EVEVLIVDL CKAQNT I A I L ERA K EQL EKTVN ELK VRI DELT 396 

Query: 347 A-LKATKASDLAHLOOLAKHIREYEQVlIEDRIEKER5KKKVK0DLLeLKSVI 39S 

L+A + A L + L K+ YE + + E +■ R KK++ DL E K ♦ 
Sbjct: 397 VELEAAQREARAALA EXQKLKN L Y EKA V - EOKEALAREH KKLQ DO LHEAKEAL 446 

Score - 107 (16.1 bits), Expect - 2.le-02, P - 2.1e-02 
Identities - 49/279 (17*), Positive* - 124/279 (44t) 

Query: 123 I T EEGPNL P E I RHRGRFA V - EFNKMQDL V FKK PT ROT I MTTET L KK I QI D RQFFS DV I AD 181 

IE L + R A+ E K+**L K ++ + E KK*Q D * +AD 

Sbjct: 392 IDE LTVELEAAQREARAA LAELQK LKN L Y EKA V EQ K EA L AREN - K K LQDDLH EAK EA LAO 4SO 

Query: 182 TIKELQDSATYNSLLQALSKERENKMHFYDI IAREEKGRKQ--I ISLQKQ1.INVKKEWQF 239 

♦+L ♦ H* L +E + ♦ ♦ R+ ♦ R Q + LQ+ I Q 
Sbjct: 451 ANRKLHELDLENARLAGEI RELOTALKESEAARRDAENRAQRALAELQQLRI EMERRLQE 510 

Query: 240 E VQSON EY I AN LKDQLQEMKAK SN L ENR YMKTNT ELQI AQTQK KCN RT E - EL L VEE I EM, 298 

♦ + N++ ++ ♦ A L t ♦ E+ ♦ ♦ + E E+ V+ + + 

Sbjct: 511 KEE EMEALRKHMQFE I DRLT AA - - LADAEARMKAE ISRLKKKYQAEIAEL EHTVDMLNRA 568 

Query: 299 RMKT EEEA RTH T E I EMFLRK EQQK LE ERL E FWHEK TDK 0T EKKQN EIN A L KAT KAS DLAH 3 SB 

♦ ♦ + + + *t I* ♦ ♦ + L+ + * + Y ♦ Q + + +AL A + + 

Sbjct: 569 N I EAQKT I K KQ5 EQLK Z LQASLEDTQRQ LQQT L DQ Y ALAQRKVSA1SA-ELEECKV 623 

Query: 359 LQDLAKMIREYEQVIIEDRIEKERSKFKVKQDLLELKSVIKLQ 401 

OA R+ ♦+ +E+ + V +L +Kt + + ♦ 

Sbjct: 624 ALDN A I RARFQAE I DL EEANGR I T DL VS VNNN LT A X KN K L E T E 666 



Pedant information for DKFtphtes3_7b22, frame 2 



Report for DKF2phtei3_7b22.2 



[LENGTH) 44 3 

[KH) 51917.95 
(pi) 6.18 

[HOHOLI PIR:S2B589 trichohyalin - rabbit 2e-08 

[ FUNCAT J 30.03 organization of cytoplasm [S. 

f FUNCAT ] 08.07 vesicular transport (golgi network, 

7a-07 

( FUNCAT ] 1 genome replication, transcription, recombination and repair IN. 

jannaschii, HJ1322] 5e-06 

(FUNCAT) 03.22 cell cycle control and mitosis (S. cerevisiae, YPR141c) le-05 

1 FUNCAT ) 03.13 Miosis (S. cerevisiae, YPR141cl le-05 

(FUNCAT) 11.01 stress response IS. cerevisiae, YPR141c| le-05 

( FUNCAT ) 03.07 pheromone response, mating-type determination, sex-specific proteins 

(3. cerevisiae. YPRMlc) le-05 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YPR141C) le-05 

09.10 nuclear biogenesis (S. cerevisiae, YPRMlc] le-05 

30.05 organization of centrosome (S. cerevisiae, YPR141c] le-05 

06.10 assembly of protein complexes (S. cerevisiae, YPR14lc) le-05 
99 unclassified proteins (S. cerevisiae, YOR216c] 3e-0S 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
(S. cerevisiae, YKR095w] 6e-05 

30.10 nuclear organization (S. cerevisiae, YKR095*] 6e-05 
30.02 organization of plasma membrane IS. cerevisiae, YEROOSc) le-04 

08.16 extracellular transport tS. cerevisiae, YEROOBc) le-04 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YEROOBc) 



[FUNCAT] 
( FUNCAT ] 
i FUHCAT ] 
I FUNCAT) 
[FUNCAT] 
[ FUNCAT ] 
repair) 
[ FUNCAT ) 
[FUNCAT) 
[FUNCAT] 
[FUNCAT] 
le-04 
( FUNCAT] 
( FUNCAT) 
I FUNCAT ) 
(FUNCAT] 
palmitylation 
(EC! 
(PIRKW) 
(PIRKWJ 
(PIRKW) 
(PIRKW) 
(PIRKW) 
(PIRKW) 
(PIRKW) 



30.04 organization of cytoskeleton [5. cerevisiae, YDR3S6w) 2e-04 

08.01 nuclear transport [S. cerevisiae, YDL207w] 4e-04 

04.07 rna transport IS. cerevisiae, YDL207w] 4e-04 

06.07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) (S. cerevisiae, YKL20lc] 5e-04 

3.6.1.32 Myosin ATPase 3e-0B 

phosphotransferase 6e-06 

citrulline Be-06 

tandem repeat It -07 

heart 6e-06 

polymorphism 4e-06 

serine /threonine -specif ic protein kinase 6e-06 
DNA binding 8e-08 
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PIRKW] 

PIRKW] 

PIRKW] 

PIRKM | 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW) 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW) 

PIRKW) 

PIRKW) 

PIRKW) 

SUPFAH) 

SUFrAM) 

SUP FAN) 

SUP FAN) 

SUPFAM) 

SUPFAM] 

PROSITE) 

PROSITE] 

PROSITE) 

PROSITE) 

PROSITE) 

PROSITE) 

KW) 

KW) 



muscle contraction le-0? 
•Ctln binding 3e-08 
ATP 3«-08 

thick filament le-07 
phosphoprotein 3e-0B 
glycoprotein 4e-06 
skeletal muscle le-07 
calcium binding Be-06 
alternative splicing 3e-08 
coiled coil 3e-0B 
P-loop 3e-08 
heptad repeat 4e-06 
Diethyl* ted amino acid 3e-08 
basement membrane 4e-06 
cardiac muscle 6e-06 
extracellular matrix 4*-06 
hydrolase 3e-08 
membrane protein 4«-06 
EF hand Be-06 
cytoskeleton 8e-06 
hair 8e-06 

myosin heavy chain 3«-08 

unassigned Ser/Thr or Tyr-specific protein kinases < 
calmodulin repeat homology Be-06 
myosin motor domain homology 3*-0B 
trichohyalin Be-06 
protein kinase homology 6e-06 
AMI OAT ION 



CAMP PHOSPHO_SITE 
CK2 PHOSPHO SITE 
TYR~PHOSPHO~S1TE 
PKC~PHOSPHO~SITE 
ASN~CLYCOSYLATIOM 
All~Alpha 

LOW~COMPLEXITY ] 



12 



SEQ 
SEG 
PRD 

SEQ 
SEC 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRO 

SEO 
SEG 
PRD 

SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 



HEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGIPKETOIEIIPEIPETLEPLSLPD 

xxxxxxxxxxxxxxxxxxxxxxx . 

cccccccccccccccccceeeeeccccccceeeeecccccceeeeeecccccccccccce 

VLRI S AVLEDTTDQLS I LN Y I M PVQY EG RQS I CVKSREMNLEGTNLDKLPMAST ITKIPS 

cnhnnhhnhnnhhhhhhhhhhhhhhhhhhhhhhhhhhhhhbhhhhhhhhhhhhhhhhhhh 

PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

hnnnhAhnhnhhhhW^hhhhh 

DTI KELQDS ATYNS LLQALS KEREN KMH FY DI I AREEKGRKQI I SLQKQL r NVKKEWQFE 
hnnnhnhhhhlihhhhhhhhiihhhhhhhhhhhhh 

VQSQNE YI ANLKOQLQEMKARSNLENRYMKTNTELQI AQTQKKCNRTEELLVEE I EKLRM 
hnhhnbhnhjiihi^hhnhnhhhhhh^ 

KTEEEARTHTEI EMFLRKEQOKLEERLErWMEKYDKDTEMKONELHAUtATKAS DLAHLQ 
bnhhnhhhji^hhhhhhh^ 

DLAKHIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQAIIWRGTMIRREIGGFKMPK 

X 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

DKVDSKDSKG KGKCKDKRRG K KK 
xxxxxxxxxxxxxxxxxxxxxxx 
ccccccccccccccccccccccc 



Prosite (or DKFIpht«a3_7b22.2 



PS00001 285->289 

PS00004 152->156 

PS00005 164->167 

PSO0OO5 182->18S 

PS00005 280->283 

PSO0O0S 383->386 

PS00006 5->9 

PS00006 30->34 



AS N_GL YCOS Y LAT I ON 
CAMP PHOSPHO SITE 
PRC PHOSPHO SITE 
PKC~PHOSPHO~SITE 
PKC~FHOSPHO~SlTE 
PRC PHOSPHO_SlTE 
CK2~PHCSPHO SITE 
CK2~PHOSPKO _ SlTE 



PDOC00001 
PDOC00004 
PDOC0000S 
PDOC0000S 
PDOC00001 
FDCC00005 
POOC00006 
PDOC00006 
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PS00006 


41->4S 


CK2 PH05PKO 


SITE 


PDOC00006 


PS00OO6 


57->61 


CK2~FH0SPHO" 


"site 


PDOC00006 


PS00006 


104->10B 


CK2~PH0SPHO" 


"site 


PDOC00006 


PS00006 


iea->i86 


CK2~PHOSPHO~ 


"site 


POOC0D006 


PS00006 


243->247 


ck2"phospho" 


"site 


PDOC00006 


PS00006 


262->266 


CK2~PH0SPH0" 


"site 


FDOC00006 


PS00006 


271->275 


CK2 _ PH0SPH0" 


'site 


PDOC00006 


P500006 


302->306 


CK2~PHOSPHCf 


"site 


POOC00006 


PS00006 


30B->312 


CK2~PH0SPH0~ 


"site 


PDOC00006 


PS00006 


310->3H 


CK2~PHOSPH0~ 


"site 


PDOC00006 


PS00007 


261->269 


TYR PHOSPHO" 


"site 


P0OC00O07 


PS00007 


1B4-M93 


TYR~PHOSPHO" 


"site 


PDOC00007 


PS00009 


218->222 


AM I OAT I ON 




PDOC00009 


PS00009 


439->443 


AHIMTIOS 




PDOC00009 



(no Pf«« data available foe 0KFZphtes3_7b22 . 2) 
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DKFZpht«9 3_ICtl7 



group: testes derived 

0Krzphtea3_7dl7 encodes a novel 613 amino acid protein with weak similarity to human XIAA0454. 
Plan predicts a TNFR/NGFR cysteine-rich region. 

Ho intormative BLAST results; No predictive prosite or SCOP wotife. 

The new protein can find application in studying tha expression profile of testia-specif ic 



similarity to KIAA0454 

cocnplete cDNA, complete cds, EST hits 

Sequenced by BHFZ 

Locos : unknown 

Insert length: 3608 bp 

Poly A stretch at poa. 3587, polyedenyletion signal at pos. 3570 



1 GGGAAGTTAC GGCCAACTCC ACCCAGCCTT TCTCAGCCaA TCTGAAGGCA 

51 AATCCTCTTT AGACCCAGCC CAACGTTCCT CGTGACCCAG CCTCTCACCA 

101 GCCAATTCTC CCTTGCCCTC CTCCTGAGGG TATCTCGAGC TTCACTCCTG 

151 TGTGCTCTTC CCCTCCACAC TGGGGATGCC ACTCACTCCC ACTCTCCACG 

201 GCTTCCAGTC GACTCTCCCA CGCCCTGATG TACAAACTTC CCCATTCGGT 

251 GCACCAAGAG CACCCTCACA TCCTGTCGGC CCACATCAAG AGCTGCGAGA 

301 TCCAACAGTC CCTGGCCCCA CCTCTTCTGC CACAAACGTC AGCATGGTGG 

351 TATCTGCCGG CCCTTCGTCC GGTGAGAAGG CAGAGATGAA CATTCTAGAA 

401 ATCAACAAGA AATCGCGCCC CCAGCTCGCA GAGAACAAAC AGCAGTTCAG 

451 AAACCTCAAA CAGAAATGTC TTGTAACTCA AGTGGCCTAC TTCCTGGCCA 

501 ACCGCCAAAA TAATTACGAC TATGAAGACT GCAAAGACCT CATAAAATCT 

551 ATGCTGAGGC ATGAGCGGCT GCTCACAGAA GAGAAGCTTG CAGAGGAGCT 

601 CGGGCAAGCT GAGGAGCTCA GGCAATATAA AGTCCTGGTT CACTCTCAGG 

651 AACGAGAGCT GACCCAGTTA AGGGAGAAGT TACAGGAAGC GAGAGATGCC 

701 TCCCGCTCAT TCAATCAGCA TCTCCAGGCC CTCCTCACTC CGGATGAGCC 

751 CGACAACTCC CAGGGACGGG ACCTCCGAGA ACAGCTGGCT GAGGGATGTA 

801 GGCTGGCACA CCACCTCGTC CAAAAGCTCA GCCCAGAAAA TGATGACGAT 

851 GAGGATGAAG ATGTTAAACT TGAGGAGGCT GAGAAACTAC AGGAATTATA 

901 TGCCCCCAGG GAGGTGCAGA AGGCTGAAGA AAAGGAAGTC CCTGACGACT 

951 CACTGGAGGA GTGTGCCATC ACTTGTTCAA ATAGCCACCA CCCTTGTGAG 

1001 TCCAACCAGC CTTACGGGAA CACCAGAATC ACATTTGAGG AAGACCAAGT 

1051 CGACTCAACT CTCATTGACT CATCCTCTCA TGATGAATCG TTGGATGCTG 

1101 TATGCATTAT CCCAGAAAAT GAAAGTGATC ATGAGCAAGA GGAAGAAAAA 

1151 GGGCCACTGT CTCCCAGGAA TCTGCAGGAG TCTGAAGAGG AGGAAGCCCC 

1201 CCAGGAGTCC TGGGATGAAG GTGATTGGAC TCTCTCAATT CCTCCTGACA 

1251 TGTCTGCCTC ATACCAGTCT GACAGGAGCA CCTTTCACTC AGTAGAGGAA 

1301 CAGCAAGTCC GCTTGGCTCT TGACATAGGC AGACATTGGT GTGATCAAGT 

13S1 GAAAAAGGAG GACCAAGAGG CCACAAGTCC CAGGCTCAGC AGGGAGCTGC 

1401 TGGATGAGAA AGAGCCTGAA GTCTTGCAGG ACTCACTCGA TAGATTTTAT 

14 51 TCAACTCCTT TTGAGTACCT GCAACTGCCT GACTTATGCC AGCCCTACAG 

1501 AAGTGACTTT TACTCATTGC AGGAACAACA CCTTGGCTTG GCTCTTGACT 

1551 TGGACAGAAT GAAAAAGGAC CAAGAACAGC AAGAAGACCA AGGCCCACCA 

1601 TGCCCCAGGC TCAGCAGAGA GCTGCCGGAG GTAGTAGAGC CTGAGGACTT 

1651 GCAGGACTCA CTGGATAGAT GGTATTCGAC TCCTTTCACT TATCCAGAAC 

1701 TGCCTGATTC ATGCCAGCCC TACGGAAGTT GCTTTTACTC ATTGGAGGAA 

1751 GAACACGTTG GCTTTTCTCT TGACGTGGAT GAAATTGAAA AGTACCAAGA 

1801 AGGGGAAGAA GATCAAAAGC CACCATGCCC CAGGCTCAAC GAGGTGCTGA 

1851 TGGAAGCAGA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATGTTAT 

1901 TCGACTACTT CAACTTACTT TCAACTACAT CCCTCATTCC AGCAGTACAG 

1951 AAGTGCCTTT TACTCATTTG ACGAACAGGA CGTCAGCTTG GCCCTTGACG 

2001 TGCACAATAG GTTTTTTACT TTGACAGTGA TAAGGCACCA CCTGGCCTTC 

2051 CAGATGGGAG TCATATTCCC ACACTAAGCA GCCCTTACTA AGCTGAGAGA 

2101 TGTCATTGCT GCAGGCAGGA CCTATAGGCA CATGTAGGTT TGAATGAAAC 

2151 TGTAGTTCCC TTTGGAAGCC CAGTCATAGG ATGGGAAAGT GGGCATGGCT 

2201 CTATTCCTAT TCTCAGACCA TGCCAGTGGC CACCTGTGCT CACTCTGAAG 

2251 ACGTTGGACC CAAGTTACGT GTGACACGTT CACACGACTA TGTAGCACAT 

2301 GCCGGGAGTG ATCTGCCAGA CATTCTAATT TGAACCAGAT ATCTCTGGGT 

2351 AGCTACAAAG TTCCTCAGGG GTTTCATTTT GCAGGCATGT CTCTGAGCTT 

2401 CTATACCTGC TCAAGGTCAG TGTCATCTTT GTGTTTAGCT CATCCAAAGG 

24 SI TGTTACCCTG GTTTCATTGA ACCTAACCCC ATTCTTTGT* TCTTCAGTGT 

2501 TGGTTTGTTT TAGCTGATCC ATCTGTAACA CAGGAGGGAT CCTTGGCTGA 

2551 GGATTGTATT TCAGAACCAC TGACTGCTCT TGACAGTTCT TAACCCACTA 

2601 GGCTCCTTTG AGTAGAGAAG CCATAGTCCT TCACCCTCCA ATTGATATCA 

2651 ATACTTAGGA AGACCACAGC TAGACGGACA AACAGCATTG CGAGGCCTTA 
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2101 GTCCTCCTCC TTTCAATTCC 

2151 TGGCAAGAGA CAGCATGTCA 

2B01 AATGCCATGT TCTTGCACAA 

2851 TCACCAGACA ACTGCAGAAT 

2901 CTCCTTCACA CAGTCCACCT 

2951 AGATATTTTG CGTTCACAAC 

3001 AGTTATTTTG AACCCCAAAT 

3051 TTTTGCTGAC ATGGACTTGT 

3101 ATGCTCTACA TTCTGAAGTT 

31S1 CCTAAACGTT TCATCAAGAA 

3201 CCTCAGCCCA TCTGTGGGCA 

3251 CATGATATCA GGACTCCTTA 

3301 CCCTTTTAGA GACACCTTAC 

3351 TCAAAGTAGA AATGTCCTGT 

HOI CATTTATTAA TCATCCCTGC 

3451 GCTGCAAATT TGCTGCCTCA 

3501 TGTGTTGTTG AAAAAAAAAC 

3551 AAGTTATTTT AATCTATACA 

3601 AAAAAAAA 



ATCCTGTAAA CAACACCAGT CAGGAGCCGC 
CCTGGGACTC TGCCAGTGCA GAATATGAAC 
AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 
GTAGAACACT GAGCAGGACA ACTGACCTGT 
CACCACGAAT CACACAACAA AAAGGAGGAG 
AAGTAAATGA TAATGTAGCT ACATTTCTTT 
ATTTCCTCAT CTTTTTGTTG TTGTCATTCA 
TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 
GTCTGAAAAT GTCTTCATGA TTAAATTCAG 
CACTACAGAG TCGATACTGT GAGTTTCCAA 
GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 
CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 
TTATGATGAA GTATTTGGGA GAGTGGTTTT 
ATTCCAGTGA TCATCCTCTA AACGTTTTAT 
CTGTGTCTAT TATTATATTC ATATCTCTAC 
ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 
ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 
ATTAAAAACT TTTGCCTATC AAAAAAAAAA 



BLAST Results 



Ho BLAST result 



Medline entries 



NO Medline entry 



Peptide information Cor frame 2 



Our from 176 bp to 2074 bpj peptide length: 633 
Category: similarity to known protein 



1 HFLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 

51 5ATMVSMWS AGPWSGEKAE KNILEINKKS RPQLAENKQQ FRNLKQKCLV 

101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGOAEELRQ 

151 YKVLVHSQER ELTQLREKLQ EGRBASHSLN QHLQALLTPD EPDNSQGRDL 

201 REQLAEGCRL AQHLVQK LSP ENDDDEOCDV KVEEAEKVQE LYAPREVOKA 

2S1 EEKEVPEDSL EECAITCSNS HHFCESNQPY GNTRITFEED QVDSTLIDSS 

301 SHDEWLDAVC IIPENESOHE 0EEEKGPVSP RNLQESEEEE APQESWDEGD 

351 WTLSIPPOMS ASYQSDRSTF HSVEEQQVGL ALDIGRHWCO OVKKEDOEAT 

401 SPRLSRELLD EKEPEVLQD3 LDRFYSTPFE YLELPDLCQP YRSDFYSLQE 

4 51 QHLGLALDLD RHKKDQEEEE DQGPPCPRLS RELPEWEPE DLODSLDRMY 

501 STPFSYPELP DSCQPYGSCF YSLEEEHVGr SLDVDEIEKY OEGEEDQKPP 

551 CPRLNEVLHE AEEPEVLQDS LDRCYSTTST YTQLHASFQQ VRSAFYSFEE 

601 QDVSLALDVD NRFFTLTVIR HHLAFOHGVI FPH 

BLASTP hits 

Ho BLASTP hits available 

Alert BLASTP hits for DKrZphtes3_7dn, frame 2 

PIR:T00069 hypothetical protein KIAA0454 - human (fragment), N - 1, 
Score - 199, P - le-11 

FIR:A45592 liver stage antigen LSA-1 - Plasmodium falciparum, K ■ 1, 
Score - 158, P - 2.7e-07 



>PIR:T00069 hypothetical protein KIAA0454 - human I t regment) 
Length - 1,682 

HSPs: 

Score - 199 (29.9 bits), Expect - 1.0e-ll, P - 1.0e-ll 
Identities - 74/261 (2BI), Positives • 122/2*1 (461) 

Query: 117 £ OCK DL I KSM LRD ERLLT EEKLAEELGOAEELAQYKVLVHSQERELTQLREKLQEG 172 

+ D + LI+ + + E L EEKLAEEL A *Y L+ Q REL* LR*K + + EG 

St> jet: 964 KPLESLIQRVSQLEAQLPKNGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query: 173 RDASRSLKQH LQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 225 

r + +H + LL +♦ O C+ REQLA+G +L ♦ L KLS ++ 

Sbjct: 1024 RG I C YL I T RH AK DTVKS FED LLRS N D I DY Y LGQS FREQLAQGSQLT ERLT S K LST KDH KS 1083 

Query: 22 « EDE D VK VE EAEKVQEL Y A P REVQKAEEK - E V PE DSLEECA I TC SN SHHPC ES NQP YGNT R 2S4 

£ * +£ L RE*Q+ E+ EV ♦ L+ +*T S*SH + S + + +T 

Sbjct: 1084 EKDQAGLEPLA LALS RE LQEKEK V I EV LQA KLD ARS LT P S SSH ALS DS KRS PS ST 5 1139 

Query: 265 I T FEEDQV — DS T L I DSS S H 0 EW LDA VC 1 1 P ENES DH EQEEEKG P VS PRN LQES EEEEA P 342 

+ E + D +♦ +H E A P + *S ♦ S * * 

Sbjct: 1140 FLSDELEACSDMDIVSEYTHYEEKKAS PSHSD5IHHSSHSAVLSSKPSSTSASQGAK 1196 

Qu«ry: 343 QE5M DEG OWT LS I PPDMS A3 YQS DRST FH 311 

E3 + +L P ♦ S FH 

Sbjct: 1197 AES-NSHPISLPTPQNTPKEANQAHSGFH 1224 



Query: 464 K DQEEEE DQG PPCPRLSRELP EWE?- EDLQOS LDRWY S T P FS Y PEL PDSCQ- P YGS 516 

KD ♦ E+DQ P RLSREL E ♦ E LQ LD TP S L DS + P + 
Sbjct: 1079 KD H KSEK DQAGLE P LALRL5 RELQEKEK V I E VLQAK L D A US LT PS S S HALS DS H RS PS ST 1138 

Query: S19 CFYSLEEEHVCFSLDVDEIEKYQEGEEDQKPP 550 

r 3 E E D+D ♦ +Y EE ♦ P 

Sbjct: 1139 SFLSOELEACS DMD1 VSEYTH YEEKKA5 P 1167 



Query: 390 DQVKKEDQEATSP RLSRELLD-EKEPEVLQDSLDRFYSTPrEYLELPOLCQ- PYRSD 44 4 

D +*DQ P RLSREL + EK EVLQ LD TP L 0 + P + 

Sbjct: 1080 DHKS EKDQAGLEPLALRLSRELQEKEKVI EVLQAKLOARSLTPSSSHALSDSHRS PSSTS 1139 

Query; 4 45 FYSLOEQKLGLALDLDRMKKDQEEEEDQGPP 475 

F S L 0+D + ♦ EE + P 
SbjCt: 1140 FL5---OELEACS0MDI VSEYTH YEERKASP 1167 



Query: 31 SHGVGRHQELRDPTV— PGPTSSATNVSMWSAGPW5- -GEKAEHNILEIKKK 79 

S G +HQE +TVPPS + V A C++++ ♦ 

Sbjct: <64 S PGKHQM QEEGNVTVRP FPR PQS LOLGAT FT V DAKQLDNQSQ PRDPCPQSAFSLPGSTQH 743 

Query: eo SRPQLAEWKQQFRNLKOKCLVTOVAYFL-ANRQKHYOYE-OCKDLIKSMLRDERLLTEEK 137 

R QL+ + KQ+**+L++K L*++ F AN t * L+K + + ♦ *+ 

Sbjct: 744 LRSQ LSQCKQR YQDLQEKLLL5 EAT VFAQAN ELEK Y RVMLTG ES L V KQDS KQ 1 QV D LQDL 803 

Query: 138 LACE LGOAE E LRQ Y K v L VH S QERELTQLREK - LQEG 172 

E G++E + + + E L+E L EG 

Sbjct: 004 GYETCGRSENEAEREETTSPECEEKNSLKEMVLMEG 839 



Query; 123 IKSMLRDERLLTEEKLAEELGQAEE LRQ Y KV LVH SOERE LTQLREK LQEGRDAS R5 178 

♦+ ♦ D+ ♦ E ♦ E* EE LRQ ♦+ V ** *L *LR* L ♦* * 

Sbjct: 5 LRQR I H OKA V AL ERA I OEK FS ALE EK E K ELRQLRLA VRE RDH D LERLRDV LS SHEA 60 

Query: 179 LHQHLQA L LT P DE POHSOGROL RCQLAEGC RLAQH LVQK L 218 

Q +++LL ♦ ♦G ♦ ♦ EQL+ C+ Q L + 

Sbjct: 61 TMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 93 



Query: 134 TEEK-LAEELGQAEELRQY KVLVHSQERELTQLREKLQECROASRSLMQHLQALLT 188 

♦ E K L *LG + EE R Y *LV Lt + LQ t*L +++L 

Sbjct: 855 SERKPLENQLGKQEEFRVYCKSEMILV-- LRKDIKDLKAQLQNANKVIQNLKSRVRSLSV 912 

Query: 189 PDEPDWSQGROLREQLAEGCRLAQHLVQXLSPENDDDEDE 22B 

+ tS R R+ A G ++ SP ♦ DEDE 

Sbjct: 913 TSDYSSSLERP-RKLRAVGT LEGSSPHSVPDEDE 94S 



Query: 
Sbjct; 



127 LRD ERLLT EEK LA EELGQAEEL RQYKV LVH SQERE LTQLREK LQEGRDAS RS LNQH L 183 

L E LL EK+A Q +£♦ R+ ++L+ ♦ L R +L E ARL L 
358 LTQE VLL LREK V AS V ESQGQE I SGNRRQQLLLM LEG--LVDERS RLNEALQAERQL Y S S L 415 
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Query: 184 QALLT P DE P DNSQ-* G RDLPXQLAEGC R LAQH L VQKL 216 

P++S+ R Lt *L EG ++ + +++ + 
Sbjct: 416 VKFHA--HPESSERORTLQVEL-ECAQVLRSRLEEV 4 48 

Scot* -54 {9.1 bits), Expert - 2.7e+00, P - 9.3e-01 
Identities - 61/264 (21%), Positives - 121/264 (4511 

Query: 3 LT PTVOG FQWTLRG P D V ET3 P FGA P RAAS HG VGRHQE - - LRDPT V PG PT S S AT KVSMWS 60 

L+ T Q QW L+ + + ET F + + + + L D SAT +♦ 
Sbjct: 79 L STTCQN LQH - LK * E EHETK - FS RWQKEQES 1 1 QQLQTS LH ORNKE VE DLS AT LLCK 132 

Query: 61 AG PWSGE KAEMN I LE I N KKS R" - PQLAEN KQQ FRM LKQKC LVTQV AY rLAM RQNN Y D If E 117 

GP E AE + *r R L*+ +Q L+ + + * ft* 

Sbjct; 133 LGPGQSE I AEELCQALORKERMLODLLSDRNKQV- - L£HEMC tOCLLQSVSTftEQE-SQA 189 

Query: 118 DC K DL I KSKLRDE RLLTEEK LAE ELGQAEELRQY K VL VK SQERELT QLREKLQEG" 172 

+ L++*** ER + L + LG + L ♦ ♦ +0+ E+T +L ♦ + +C 
SbjCt: 190 AAEK LVQALH - - E RN S ELQALRQ YLGG RDS LMS - OA P I SNOOAEVT PTG RLGKQT DQGSM 246 

Query: 173 RDAS RS LNQK LQALLT PDEPDN SQGRDL REQLAEGC RLAOH L VQK L5 PEN DDDE DE BVK V 232 

+ SR ♦ t A P ++ G DL + +A G L **LS H *E E + 

Sbjct: 247 QIPSRDDSTSLTAKEDVSIPRSTLC-DL-DTVA-G LEKELS— KAKEELELMAK 295 

Query: 233 E EAE K VQEL Y A PRE VQKAEEK EV PEDS LE EC A I T 266 

+E E EL A + + +E + E+ ♦ + ++T 
SbjCt: 296 K ERES QMEL5 ALQSMMA VQE E E LQVQAADME S LT 329 

Score - 49 (7.4 bits), Expect - 6.3e*00, P - 1.0«+00 
Identities - 21/B7 (241), Positives - 39/B7 (44t> 

Query: 192 POM SQGRDLREQLAEGC RLAOHL VQK LS PEH DDDE DE DVKVE EAEK VQEL YAP REVQKAE 251 

P ♦ ♦Q LR QL** ♦ Q 1> +KL + ♦ t EK * ♦ + K ♦ 

Sbjct: 738 PGSTQ- - H LR5QL SQC KQR YOOLQEK LLLS EAT V FAQAN ELEK Y R VMLTG ES L V KQD 792 

Query: 252 EKEVFEDSLEECA I -TCSNSHHPCE5NQ 278 

K+* D L*+ TC S + E + 
Sbjct: 793 SKQIQVD-LODLGYETCGRSENEAEREE 819 

Score * 46 (6.9 bits), Expect - 6.3e+0O, P - l.Oe+00 
Identities - 19/77 (24t), Positives - 39/77 (501) 

Query: 112 tl N K D Y E DC K DL I K SMLRDERL LT £ EK LAE ELGQ A EEL RQ YK VL VH SQERELTQ LREK LQ- 170 

* +* E+ K* K + E ++T+E L+E QAE R+ + ♦ ♦ ♦ L+E+L 
Sbjct: 597 DGWEI EEDKE"KGEVMVETWTKEGLSESSLQAE-FRKLQGKLK1JAHNI INLLKEQLVL 653 

Query; 171 EGRDASRSLNQKLQALLT 188 

+ + + L L LT 
Sbjct: 654 SSKEGNSKLTPELLVHLT 671 



Pedant information for DKFZphtes3_7dl7, frame 2 



Report lor OKF3phtes3_7dl7 .2 



[ LENGTH 1 

[Wt 

[pl\ 

[HOMOLJ 

( BLOCKS | 

[PROSITE] 

[PROSITE) 

[PROS1TEJ 

(PROSITE | 

(PFAMJ 

<KW] 

IKW| 

(KW! 



i (fragment) 2e-ll 



KYRISTYL 2 
CK2 PHOSPHO SITE 
PKC~PKOS PHO~SITE 
ASN~G LYCOS Y LAT I ON 

tnfr/NGFR cysteine 
All Alphe 

lok~complex:ty 
coiled coil 



-rich region 



SEO MPLT PTVQG FQWT LAG P DV ET S P FGA P RAAS HGVGRHQE LRO PTV PC PTSS ATNV SMVVS 




COILS 



SEO AG PWSGE KAEMN ILEIHKKSR PQLAENKQOFRN L KQKC L VTQV A Y FLAN RQNN YD Y EDC K 

SEG 

pro ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhnhcccccccccch 

COILS 
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SCO D L I K SHL RDERLLT E EKLAE E LGQ A EE LRQYK V LVH 5QE RE LTOLKE K LQEGRDAS RS LW 

SEG 

PRO hhhhhhhhhhnhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COILS ccccccccccccccccccccccccccccccccccccc 

SEQ QHLQALLT P DE PDW S QGRDLP 5QLAEGCRLAQH L VQKLS PEN DDD E D E 0 VK VEEA E K VQE 

SEC XXXXXXXXXXXXXXXX. . 

PRD hhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhh 

COILS CCCCCCC 

SEQ L YAPREVQIWEEKEVPEDS LEECAI TCSNSHH PCESNQP YGMTRI TFEEDQVDSTLI DSS 

SEC 

PRD hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccct«»B«cccccccccccc 

COILS 

SEQ SHDEWLDAVC 1 1 PEMESDHEQECEKG PVSPRHLQES EEEEA PQESW DEGDVT LS I PPDKS 

SEG xxxxxxxxxxxxxxx 

pro ccchhhhh«««ccccccchhhhhhcccccccccchhhhhhhccccccccccccccccccc 

COILS 

SEQ AS YQS DRST FHS VEEOQVG LALO I G RHWC DQVKKEOQEATS PRLS RELLDE KE F E VLQOS 

SEG 

PRD ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccch^hhhhhhhh^lh«••ecc 



SEQ LDRFYSTPFEYLELPDLCQF YR5 DFYS LQ.EQHLGLALDLDRMKKDQEEEEDQGPPC PRLS 

SEG 

PRO hhhh h ccc • e e t ec cccc c c ccc cc h h h hh h hhhhhhhcchhhhh h h h hhcccecccccc 

COILS 

SEO REL P EVVEP ED LQDS LDRHYSTPFSYPEL PDSCQP YG5C FY S LEE EH VGFS LDVOEIEXY 

SEG 

PRO ccc e« • mccc h hhhhhhhhcc ccc c c eeeccec cccc c t • e e c cc« • tec c cc hh hhhh 

COILS 

SEQ QEGEEOQKPPCFRLNEVLKEAEEPEVLQDSLDRCYSTTSTY FQLHASFQQYRSAFYSFEE 

SEG 

PRO hcccccccccccchhhhhhhhhchhhhhccccc«ieccee«»hhhhhhhhhhhhhhhhhc 

COILS 

SEO QDVSLALDVDNRFFTLTVIRHHLArQHGVI FPH 

SEG 

PRD cchhhhhhcccehhhhhhhhhhhhtthhhhcccc 

COILS 



Prosite (or DKFZpht«3_7dl7.2 



PS00001 


s< 


l->58 


ASH 


GLYCOSYIATION 


PDCC00001 


PS00001 


315- 


■>319 


asn" 


"GLYCOSYLATION 


PDOC00O01 


PSOOOOS 


i: 


l->16 


PKC* 


"PHOSPHO SITE 


PDOCOOOOS 


PS00005 


329- 


>332 


PKC" 


"PHOSFKO~SITE 


PDOCOOOOS 


PSOOOOS 


165- 


->363 


PKC" 


"pHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


401- 


■>404 


PKC' 


"PHOSFKO~SITE 


PDOCOOOOS 


PS00006 


188- 


•>192 


CK2* 


"pHOSPKO~SITE 


PDOC00O06 


PS00006 


259- 


■>263 


CK2" 


>KOSPHO~SITE 


PDOC00006 


PS00006 


2B6- 


■>290 


CK2* 


~PHCSPH0~SITE 


PDOCOOOOS 


psooooe 


295- 


•>299 


CK2" 


"phosphors ite 


POOC00006 


PS00006 


300- 


■>304 


CK2" 


'PKOSPHO~SITE 


PDOC00006 


PS00006 


317- 


>321 


CK2~ 


>HOSPKO~SITE 


POOC00006 


PSOOOOS 


336- 


■>340 


CK2" 


"PHOSPHORS ITE 


POOC00006 


PS00006 


345- 


■>349 


CK2" 


"PHOSPHO'SITE 


PDOC00006 


PS00006 


372- 


■>376 


CK2" 


>HOSPHO~SITE 


POOC00006 


PSOOOOS 


427- 


■>431 


CK2* 


*PHOSPHO~SITE 


PDOC00006 


PS00006 


447- 


>451 


CK2' 


>HOSPHO~SITE 


PDOCOOOOS 


PS00006 


SOS- 


■>509 


CK2* 


"PHOSPHO~SITE 


FDOC00006 


PS00006 


522- 


>S26 


CK2* 


"PHOSPHO'SITE 


PDOC00006 


psooooe 


S97- 


■>601 


CK2' 


>HOSPHO*SITE 


PDOCOOOOS 


PSOOOOS 


2! 


i->31 


MYRISTYL 


PDOCOOOOS 


psooooa 


207- 


>213 


KYRI STYL 


PDOCOOOOS 



Pram for DKFZphtes3_7dl7.2 
TNFR/NGFR eysteln«-rich region 

•CpeGtYt DWNH vpqC lpCt rCt PEMGQYMvq PCTwTQWTVC * 

C + ++ +»)♦♦♦ + ♦+ + ♦ + + + + + *+vc 

274 CESHQPYG-NT-RITFEEDOVDS— TLIDSSSHDEWLDAVC 
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DK«phtes3_7j3 

group: cell cycle 

DKFZphtes3 7j3.2 encode* a novel 628 amino acid putative protein kinase, which is related to 
the C-TAKl"'Cdc2SC associated piotein kinase. 

Cdc2SC is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2. 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
Mediates th* binding of 14-3-3 protein to Cdc2SC. C-TAK1 (Cdc twenty-five C associated protein 
kinase) phosphorylates Cdc2SC on serine 216 in vitro. The new protein is closely related to C- 
Takl and therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating/blocking the cell cycle. 

strong similarity to serine/thrtonine-specif ic protein kinase* 
complete cDNA, complete Cds, potential start at Bp 128, few EST hies 
Sequenced by BMF2 
Locus : unknown 



1 GTGCTTTACT GCGCCCTCTC GTACTGCTGT CGCTCCCCGT CCTGGTCCGG 
51 CACCTGTGCC CCGCCCTTCA GCCCTCCCCG CACAGCCTAC TGATTCCCCT 
101 GCCGCCCTTG CTCACCTCCT GCTCCCCATG CAGTCGCTGG TTTTCGCCCC 
151 GCGCTCCGGC CCCACTCCCT CGGCCGCAGA CCTAGCCCGG CCCCTCGCGC 
201 AAGGGCTGAT CAACTCCCCC AAGCCCCTAA TGAACAAGCA GGCGGTCAAC 
251 CGGCACCACC ACAAOCACAA CCTGCGGCAC CCCTACCAGT TCCTGGAGAC 
301 CCTGSGCAAA GGCACCTACG CGAAGCTCAA GAAGGCGCGC GAGAGCTCGG 
351 CGCGCCTGST GGCCATCAAG TCAATCCGGA ACGACAAAAT CAAACATGAG 
401 CAAGATCTGA TGCACATACG GAGGGAGATT GAGATCATGT CATCACTCAA 
451 CCACCCTCAC ATCATTGCCA TCCATGAACT GTTTGAGAAC AGCAGCAACA 
501 TCGTGATCGT CATGGAGTAT GCCAGCCCCG GCGACCTTTA TGACTACATC 
551 AGCGAGCGGC AGCAGCTCAG TGAGCGCGAA CCTAGGCATT TCTTCCGGCA 
601 CATCGTCTCT GCCGTGCACT ATTGCCATCA GAACAGAGTT GTCCACCGAG 
651 ATCTCAAGCT GGAGAACATC CTCTTGGATG CCAATGGGAA TATCAAGATT 
701 GCTGACTTCG GCCTCTCCAA CCTCTACCAT CAAGGCAAGT TCCTGCAGAC 
751 ATTCTGTGGG AGCCCCCTCT ATGCCTCGCC AGAGATTGTC AATGGGAAGC 
801 CCTACACAGG CCCAGAGGTG GACAGCTGGT CCCTGGGTGT TCTCCTCTAC 
851 ATCCTGCTCC ATGGCACCAT GCCCTTTGAT CGGCATGACC ATAAGATCCT 
901 AGTGAAACAG ATCAGCAACG GGGCCTACCG GGAGCCACCT AAACCCTCTG 
951 ATGCCTGTGG CCTGATCCGG TGGCTGTTGA TGGTGAACCC CACCCGCCGG 
1001 GCCACCCTGG AGGATGTGGC CAGTCACTGG TGGGTCAACT GGGGCTACGC 
10S1 CACCCGACTG GGAGAGCACG AGGCTCCGCA TGAGGGTCGC CACCCTGGCA 
1101 GTGACTCTGC CCCCGCCTCC ATGGCTGACT GGCTCCGGCG TTCCTCCCGC 
1151 CCCCTCCTGG AGAATGGGGC CAAGGTGTGC AGCTTCTTCA AGCAGCATGC 
1201 ACCTGGTGGG GGAAGCACCA CCCCTGGCCT GGAGCGCCAG CATTCGCTCA 
1251 ACAAGTCCCG CAAGGAGAAT GACATGGCCC AGTCTCTCCA CAGTGACACG 
1301 GCTGATGACA CTGCCCATCG CCCTGGCAAG AGCAACCTCA AGCTGCCAAA 
1351 GGGCATTCTC AAGAAGAAGG TGTCAGCCTC TGCAGAAGCG GTACAGGAGG 
1401 ACCCTCCGGA GCTCACCCCA ATCCCTGCGA GCCCAGGGCA GCCTGCCCCG 
1451 CTGCTCCCCA AGAAGGGCAT TCTCAAGAAG CCCCGACAGC GCGAGTCTGC 
1501 CTACTACTCC TCTCCCGAGC CCAGTGAATC TGGGGAGCTC TTGGACGCAG 
1551 GCGACGTGTT TGTGAGTGGG GATCCCAAGG AGCAGAAGCC TCCGCAAGCT 
1601 TCAGGGCTGC TCCTCCATCG CAAAGGCATC CTCAAACTCA ATGGCAAGTT 
1651 CTCCCAGACA GCCTTGCAGC TCGCGGCCCC CACCACCTTC GGCTCCCTGG 
1701 ATGAACTCCC CCCACCTCGC CCCCTGGCCC GGGCCAGCCG ACCCTCAGGC 
1751 GCTGTGAGCG AGGACAGCAT CCTCTCCTCT GAGTCCTTTG ACCAGCTGCA 
1801 CTTGCCTGAA CGGCTCCCAG AGCCCCCACT GCGGGGCTCT GTGTCTGTGG 
1851 ACAACCTCAC GGCCCTTGAG GAGCCCCCCT CAGAGCGCCC TGGAAGCTGC 
1901 CTGAGGCGCT GGCGGCAGGA TCCTTTGGGG GACAGCTGCT TTTCCCTGAC 
1951 AGACTCCCAG GACCTGACAG CGACCTACCG ACAGGCACTG AGGGTCTGCT 
2001 CAAAGCTCAC CTGAGTGGAG TAGGCATTGC CCCAGCCCGG TCAGGCTCTC 
2051 AGATGCAGCT GGTTGCACCC CCAGGGGAGA TGCCTTCTCC CCCACCTCCC 
2101 AGGACCTGCA TCCCAGCTCA GAAGCCTGAG AGGGTTTGCA GTGGAGCCCT 
2151 GAGCAGGGCT GGATATGGGA AGTAGGCAAA TGAAATGCGC CAAGGGTTCA 
2201 GTCTCTGTCT TCACCCCTGC TGAACGAAGA GGATACTAAA GAGAGCGGAA 
2251 CGGGAATCCC CCCGACAGAG TCCACATTGC CTGTTTCTTG TGTACATGGG 
2301 GGGGCCACAG AGACCTGGAA AGAGAACTCT CCCAGGGCCC ATCTCCTGCA 
2351 TCCCATGAAT ACTCTGTACA CATGGTGCCT TCTAAGGACA GCTCCTTCCC 
2401 TACTCATTCC CTGCCCAAGT GGGGCCAGAC CTCTTTACAC ACACATTCCC 
2451 GTTCCTACCA ACCACCAGAA CTGGATGGTG GCACCCCTAA TGTGCATGAG 
2501 GCATCCTGGG AATGGTCTGC AGTAACGCTT CGTTATTTTT ATTTTTATTT 
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2551 
2601 
2651 
2101 
2751 
2B01 
2651 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 



TTATTTATTT 
GCTACAGTGC 
CTTCAAGCGA 
GCCCGCCACC 
CTCCATGTTG 
CACCTCGGCC 
CACCTAACCC 
TTCTTCAATG 
TCCTGAAGTT 
TGTGTGGACT 
ACCTCACTGA 
ATCCATGTGT 
TTATGTTCTT 
TAATGTGAAT 
TGTACAGAGA 
CACACTCCAC 
ATGGACCTCC 



ATTTATTTTT 
AATCGCGCGA 
TTCTCCTGCC 
ATGCCCGGCT 
GTCAGGCTCC 
TCCCAAAGTG 
TTCCTTATTT 
GTTCTCTTCC 
GCTGCTGTGA 
TCATCTCAAG 
CTCAGAACTT 
TCTCTAGGCC 
GGCTTTCTGT 
GCTATGTTCT 
GATATTTTTG 
TCCACACTCT 
GTGGCCAAAA 



TTGAGACGGA 
TCTCAGCTCA 
TCAGCCTCCC 
AATTTTGTAT 
TCTCAAACTC 
CTGGGATTAC 
AGCCTAGGAG 
CTTTTCCATC 
ATCTGAAAGA 
GGGCCCAGCC 
CTGCCTCTAA 
TTCAGGACTC 
TTTAGGAAAA 
GGGAAAATCC 
CAACTATTTC 
TGAGTCTCTT 
AGTACCATTA 
AAAAAAAAAA 



GTTTCGCTCT 
CCTCAACCTC 
TAGTAGCTGG 
TTTTAGTAGA 
CCCACCTCAG 
AGGCGTGAGC 
TAAGAGAACA 
CTCCAAACCT 
CTTGAAAAGC 
TCCTCTGGAC 
GCTGCTCTAA 
TAGAATGTCC 
GTGAATCTTG 
ACTAT6ACAT 
CACCTCCTCC 
TACCTAATGG 
AAACCAGAAA 
AAAAAAAAAA 



TGGTGCCCAG 
CGCCTCCCGG 
GATTACAGGC 
GACAGGGTTT 
GTGATCCACC 
CACCCCGCCC 
CAATCTCTGT 
GGCCTGAGCC 
CTCCGCCTGC 
TCCACCTTGG 
AGTCCAGACT 
ATATTTATTT 
CTGTTTTCAA 
CTAAGTTTTG 
CACAACCCCC 
TCTCTACCT* 
GGTGATTGGA 



no BLAST result 



BLAST Results 



Medline <ntri«j 



9820230'': 

C-TAK1 protein kinase phoaphoryletes 
promote* 14-3-3 

protein binding. 



i serine 216 and 



Peptide information for (cane 2 



1 MESLVFARRS 

51 HRYEFLETLG 

101 IEIKSSLNHP 

151 EARHTFROIV 

201 HQGKFLQTFC 

251 DGHDHKILVX 

301 WWVHWGYATR 

351 CSFFKQHAPG 

401 KSNLKLPKGI 

451 KPRCRESGYY 

501 ILKLNGKF5Q 

551 SESFDQLDLP 

601 GDSCFSLTDC 



G7TP5AAELA 
KCTYGKVKKA 
HI IAIHEVFG 
SAVHYCHQNR 
GSPLYASFEI 
Q1SNGAYREP 
VGEQEAPKEC 
GGSTTPGLER 
LKKKVSA5A£ 
SSPEPSESCE 
TALELAAPTT 
ERLPEPPLRG 
QEVTATYRQA 



RPLAEGLI KS 
RCSSGRLVAI 
NSSKIVIVME 
WHROLKLEH 
VNGKPYTGPE 
PKPSDACGLI 
GH PGSOSARA 
QHSLK1CSRXE 
GVQEDPPELS 
LLDAGDVFV3 
FG5LDELAPP 
CVSVDNLTGL 
LRVCSKLT 



BLAST P hits 



PKPLMKKOAV 
K5IRKDXIK0 
YASRGDLYDY 
I LLDAHGN I K 
VDSHSLGVLL 
RWLLHVNPTR 
SMADWLRRS3 
H0HAOSLHS0 
PIPASPGQAA 
GDPKEQKFPQ 
RPLARAS RPS 
EEPPSEGPCS 



KRHHHKHNLR 
EOOLMHI RRE 
ISERQQLSER 
lA0rGL5NLY 
YILVHGTMPF 
RAT LEDVASH 
RPLLENGAKV 
TADDTAHRPG 
PLLPKKGI LK 
ASGLLLHRXG 
GAVSEDSILS 
CLRRHRQDPL 



No BLA5TP hits available 

Alert BLASTP hltl for DKFIpht*s3_7 j3, ( 
No Alert BLASTP hit* found 

Pedant Information for DKrzphtea3_7 j 3, 

Report for DKFlphtea3_7 j3 . 2 



( LENGTH I 



628 



69612.39 
(pi) 9.01 

(HOHOL) TREMBL:AB011109_l gene: "KIAA0537"; product: "KIAA0J37 protein"; Homo sapiens 

nRNA for KIAA0537 protein, complete cd*. le-152 

01.05.04 regulation of carbohydrate utilisation IS. cereviaiea, YDR477w] 



irONCATl 
5e-66 
trONCATI 



11.01 i 



:a*ponae (3. cerevisiae, YDR177w| 5e-£6 
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( FUNCAT ] 
1 FUNCAT) 
(FUNCAT) 
(FUNCAT] 

e«-52 

(FUNCAT) 
( FUNCAT) 
(FUHCAT) 
(FUNCAT) 
(FUNCAT) 
( FUNCAT ) 
(FUNCAT) 
repair) 
[ FUNCAT 1 
[ FUNCAT ) 
[ FUNCAT 1 
[ FUNCAT | 



30.03 organization of cytoplasm IS. cerevisiae, YDR477w) 5e-66 
98 classification not yet clear-cut (S. cerevisiae, YLR09 6w) 6e-54 

30.02 organiration of plasma membrane [S. cerevisiae, YLR096w] 6«-S4 

03.04 budding, cell polarity *nd filament format ion (S. cerevisiae, YDR507c) 



03.25 cytokinesis [S. cerevisiae, YDR507e] 8«-52 

03.22 cell cycle control and mitosis IS. cerevisiae, YKLlOlw] 9«-51 
30.10 nuclear organization (S. cerevisiae, YKLlOlw) 9e-ll 
99 unclassified proteins (S. cerevisiae, YPL141C) le-4S 

10.99 other signal- transduction activities (S. cerevisiae, YPL153c] 6e-44 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 6e-44 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPM53c) 6e-44 

03.19 recombination and dna repair [S. cerevisiae, YFL153cl 6e-44 
03.16 dna synthesis and replication [S. cerevisiae, YMROOlc) 2e-42 
10.02.11 key kinases (S. cerevisiae, Y8U05c| 3e-34 

04.05.01.04 transcriptional control IS. cerevisiae, YKL139w CTK1 - carboxy- 
:erminal domain | 2e-28 
(FUNCAT) 03.01 cell, growth IS. cerevisiae, YFR014cl 4e-2S 

(FUNCAT) 03.10 sporulation and gemination (S. cerevisiae, YGL180v| 2e-26 

I FUNCAT ] OS. 13.04 lysosomal and vacuolar degradation (S. cerevisiae, YGLlBOw) 2e-26 

( FUNCAT ) 08.13 vacuolar transport [S. cerevisiae, YGLlBOw) 2e-26 

(FUNCAT) 04.99 other transcription activities (5. cerevisiae, YER129w] 4e-26 

(rUNCAT) 02.19 metabolism of energy reserves (glycogen, trehalose) (S. cerevisiae, 

YPL031O 5e-24 

(FUNCAT) 01.04.04 regulation of phosphate utilisation (3. cerevisiae, YE>L031c] 

Se-24 

[ FUNCAT ) 03.07 pheromone response, sating- type determination, sex-specific proteins 

(S. cerevisiae, YHLOOIc] 6e-24 



( FUNCAT ) 
(FUNCAT) 
(FUNCAT) 
(FUNCAT) 
(FUNCAT) 
6e-21 
( FUNCAT ) 
pelmityletion 

( FUNCAT) 
(FUNCAT) 
(FUNCAT) 
YNL183C) le-f 
[ FUNCAT) 

le-17 
[ FUNCAT ) 
(FUNCAT | 
( FUNCAT ) 
le-15 
(FUNCAT I 
5e-15 
( FUNCAT ) 
(FUNCAT) 
YBR097W) 2e-08 
(FUNCAT) 

(FUNCAT) 
2e-08 
(FUNCAT) 
[FUNCAT) 
Be-05 
[ FUNCAT ) 
cerevisiae, 
(BLOCKS) 
(BLOCKS) 
| BLOCKS ) 
I SCOP ) 
I SCOP) 
I SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
I SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP) 
(SCOP I 
(SCOP I 



10.05.11 key kinases [S. cerevisiae, YHL007C) 6e-24 
09.01 biogenesis of cell wall IS. cerevisiae, YNR031e) le-22 

10.03.11 key kinases [S. cerevisiae, YNR031c) le-22 
03.13 meiosis (S. cerevisiae, YDR523c] Be-22 

04.05.01.01 general transcription activities [3. cerevisiae, YDHOBw) 

OS. 07 protein modification (glycolsylatlon, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YFL033C] 6*-21 

10.05.09 regulation of g-protein activity (S. cerevisiae, YBL016«) 7e-19 
10.04.11 key kinases (S. cerevisiae, YDLJ59w) 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization (3. cerevisiae, 



08.99 other intracellular-transport activities 



(3. cerevisiae, YNL183c) 



01.07 translational control {S. cerevisiae, Y0R283c| 2e-17 

09.04 biogenesis of cytoskeleton IS. cerevisiae, YNL020c) 4e-16 

04.03.99 other trna-transcription activities (S. cerevisiae, YOR061w] 



10.04.99 other nutritional -response activities 



(S. cerevisiae, YJR059w| 
(S. cerevisiae, 

OS. 07 vesicular transport (golgi network, etc.) (S. cerevisiae, YSROSlwJ 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YBR097w) 



(EC) 
(EC) 



01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
YHR079C) 8e-05 

BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase Class II proteins 
' BL00107A Protein kinases ATP-binding region proteins 

dlgol 5.1.1.1.9 MAP kinase Erk2 (rat Rattus norvegicus le-77 

dlwfc 5. 1.1.1. B HAP kinase p38 (human (Homo sapiens) 4e-68 

dlkoa 2 5.1.1.1.7 (1-350) Tvitchin, kinase domain (Caenorhabditi 2e-S5 
dlkoba 5.1.1.1.6 Twit chin, kinase domain (California sea har le-80 
dlphk_2 5.1.1.1.S gamma -subun it of glycogen phosphor y lass kines 2e-76 

dlirk 5.1.1.2.4 insulin receptor (Hunan (Homo sapiens) le-69 

dlapme_ S. 1.1. 1.4 cAKP-dependent pk, catalytic subunit (mouse (Ku le-S4 
dlfgka 5.1.1.2.3 Fibroblast growth factor receptor 1 | human (Horn le-68 
dlydre - 5.1.1.1.3 cAMP-dependent PK, catalytic subunit I bovine (Bo 9e-85 
dlfmk 3 5.1.1.2.2 (168-437) c-src tyrosine kinase (human {Horn le-69 
dlcdka 5.1.1.1.2 cAMP-dependent PK, catalytic subun It (pig (Su le-85 
d2hcka3 5.1.1.2.1 (167-437) Haemopoetlc cell kinase Hek [huma Se-66 

dlcsn 5.1.1.1.11 Casein kinase-1, CK1 (Schiiosaccheronyces pombe 9e-47 

dljsua 5.1.1.1.1 Cycl in-dependent PK (Human (Hooo sapiens) le-75 
dlckja" 5.1.1.1.10 Casein kinase-1, CK1 (rat (Rattus norvegicus) 5e-54 
2.7.1.38 Phosphorylase kinase le-36 
2.7.1.123 Ca2>/calfl»dulin-dcpendent protein kinase 4e-10 
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>IRXW| 
» I RKW ] 



[SUP FX* ) 
(SUPfAM) 



[SUI 



TAM] 
IPFAM] 
[PROSITE] 
I PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 



2.1.1.128 [Acetyl-COA carboxylase) kinase le-fil 
2.7.1.117 Nyoaln-light-ehain Kinase 2e-40 

2.7.1,109 |Hydroxymethylglutaryl-CoA reductase < NADPK) ] kinase le-61 

2.7.1.37 Protein kinase 7s-42 

phosphotransferase 6s- 66 

nucleus le-64 

calcium 7e-3S 

duplication le-38 

tandem repeat 4e-39 

phorbol ester binding le-38 

line le-38 

cell cycle control le-42 

serine/threonine-specif ic protein kinase 8e-68 

oncogene le-40 

phospholipid binding le-38 

autophosphorylation le-64 

brain la-AO 

heterotetrenier 2e-36 

mitosis 7e-42 

polymer le-35 

magnesium 6e-66 

ATP Be-68 

polyprotein le-40 

phosphoprotein le-64 

apoptosis 4e-39 

glycoprotein 7e-42 

leucine tipper 3e-3S 

skeletal muscle 7e-3S 

protein kinase Se-41 

cAKP binding 3e-38 

testis 9e-36 

purine nucleotide binding 2e-49 
calcium binding Be-39 
alternative splicing 3e-37 
P-loop 2e-49 

lipoprotein 2e-33 | 

segmentation le-33 

core protein le-40 

muscle 7e-35 

myristylation 2e-33 

EF hand 8e-39 

cell division 2e-40 

calmodulin binding 4e-40 

ribosomal protein 36 kinase II 5e-36 

fibronectin type III repeat homology 3e-33 

immunoglobulin homology 3e-33 

calcium-depenoant protein kinase 8e-39 

AMP- activated protein kinase Se-66 

protein kinase akt 3e-42 

protein kinase 5PK1 le-42 

unassigned Ser/Thr or Tyr-speciiic protein kinases 8e-6B 
Ca2+/calmodulin-flependent protein kinase 3e-37 
calmodulin repeat homology 8e-39 

CAMP receptor protein cyclic nucleotide -binding domain homology 6e-33 
protein kinase C leta le-36 

Oictyostelium cAMP-dependent protein kinase catalytic chain le-34 
death-associated protein kinase 4e-39 
pleckstrin repeat homology 3e-42 
ankyrin repeat homology 4e-39 
protein kinase homology 8e-6B 

Ca2+/calmodulin-cependent protein kinase II 8e-41 
protein kinase C zinc-binding repeat homology le-38 
twitchin 3e-33 

protein kinase C delta le-38 

cGMP- dependent protein kinase 6e-33 

protein kinase cdrl 7e-42 

protein kinase C C2 region homology 3e-37 

protein kinase C alpha 3e-37 

yeast protein kinase C 5e-36 

kinase-releted transforming protein le-41 

kinase interaction domain homology l«-42 

gag-akt polyprotein le-40 

Ca2+/calomdul in-dependent protein kinase I 4e-40 
protein kinase C mu 4e-33 
PROTEIH KINASE ATP 2 
RGB ~1 

myristyl 4 

camp phospho site 3 

ck2 phospho site 13 

tyr2 phospko ~ site 2 

prc phospko~site 12 
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( PR05ITE] ASN GLVCOS Y L AT I OH 2 

I PROS I TE) PROTEI N_KI NASE_5T 1 

tprui) Euk*ryotic piutain kinas* domain 

1KH) All Alph* 

IK*] 3D ~ 

(KW| LOH_COMPLEXITV 10.51 * 

SCO MES LV FARRSG PT PS AAELAR PLAEGL I KS PK PLM KKQA V K RHHHKHNLRHR Y E FL ETLG 

SEC xxxxxxxxxxxx 

IctpE HHHHHHHHHHKHHHKCCCCCCCC - -GGGEEEEEEEE 

S EO KGT YGKVKKARESSCR1VA I KS I HKDK1 KDEODLMH I RREI EIM3SLNH PH 1 1 A I HEVFE 

SEC 

IctpE CTTTEEEEEEEETTT E E EE EEEEEHHHH HH KCCKHHH HHHHHHHHCCCTTT BCCEEEEEE 

SEO NSSKI VI VKEY ASRGDL Y DY I S ERQQLS EREARH rrRGI VS A VH YC HQW RWHRDLKLEN 

SEC 

IctpE ETTEE EEEEECTTTTBH HHHHHH HCCCC H HHHHHHHHHHHHHHH H HKHCC EECCCCCGGG 

SCO I LLOANGNI K I AO FGLS NL YHQG K FLQT FCGS PLYAS PEI VWGXPYTGPEVDSWS LGVLL 

SEC 

ICtpE EEETTTTC EEECCTTTT EET - TTT - BCCCCCCCCGCC HHH H HCCC BC -■ H HHH HKHHHHHH 

SEO Y I LVH GTMP FDGH OH K I LVKQ I SNGA Y RE P PKPS DACGL I RttLLMVN PTRRAT LEOV ASH 

SEC 

IctpE HHRHKC C 1 I f T T T T HRHHH HH HH HCCCCCTTTC H H HHHHHHHTTTTTGGGTTTH H HH H HC 

SEQ WW VNWG Y ATR VG EQEA PHEGCH PCS OS ARASNAOWLRRS S RP LLENGAK VC S F FK QH APG 

SEC 

IctpE GG 

SEO GGS TT PC LERQH3 L KK3 RK EM DMAOS LH5 DT AO UTAH RPG K5NLKLP KG I LKKKVSASAE 

SEC 

IctpE 

SEO GVQEDPPELSPIPASPGGAAPU,PKKGILKKPRQRESGYYSSPEPSESGEI.LDACDVFVS 



SEQ 
SEG 

IctpE 

SEQ GAVSEDSILSSESrD0LDLPERLPEPPLROCVSV0HLTCLEEPPSECPGSCLR*WRQDPL 

SEC xxxxxxxxxxxxx 

IctpE 



SEQ GDSCFSLTDCQEVTATYRQALRVCSKLT 

SEG 

IctDE 



Pto»it«' for DKFZphtes3_7j3.2 



PS00001 


121- 


•>125 


ASN GLYCOS V LAT I ON 


PDOC00001 


PS00001 


576- 


■>580 


ASN GLYCOS Y LAT 1 OK 


PDOC00001 


PS00004 


290- 


>294 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


337- 


■>341 


CAMF~PHOSPHO~3ITE 


PDOC00004 


PS00004 


413- 


>417 


CAMP PHOSPHO SITE 


PDOC00004 


PSOOOOS 


30->33 


PKC PHOSPHO SITE 


PDOCOOOOS 


PSOOOOS 


It 


l->77 


PKC~PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


B2->85 


P KC~ PHOS PH 0~S I TE 


PDOC0000S 


PSOOOOS 


122- 


■>125 


PKC PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


142- 


■>145 


PKC"PKOSPHO~S1TE 


PDOC0000S 


PSOOOOS 


HB- 


->151 


PKC~PHOSPHO~SITE 


FDOC0000S 


PSOOOOS 


289- 


■>292 


FKC~PKOSPHO~SITE 


PDOCOOOOS 


PSOOOOS 


327- 


>330 


PKC PHOSPH0~SITE 


POOCOOOOS 


PSOOOOS 


339- 


•>342 


PKC~PKOSPHO~SITE 


PDOC0000S 


PSOOOOS 


373- 


■>376 


PKC~PKOSPH0~SITE 


PDOCOOOOS 


PSOOOOS 


377- 


■>3B0 


PKC~PKOSPHO~SITE 


POOCOOOOS 


PSOOOOS 


616- 


>619 


PKC~PKOSPHO"SITE 


PDOCOOOOS 


PSOOOOS 


15->19 


CK2~PKO3PH0~SITE 


PDOC00006 


PSOOOOE 


133- 


■>137 


CK2~PHOSPH0 _ 5ITE 


PDOCOOOOS 


PSOOOOG 


148- 


•>152 


CK2~PHOSPH0~SITE 


PDOCOOOOS 


PS00006 


227- 


•>231 


CK2~PHOSPHO~SITE 


PDOC00O06 


PSOOOOG 


293- 


•>297 


CK2 _ PKOSPHO~SITE 


PDOCOOOOS 


PS00006 


331- 


■>335 


CK2~PHOSPHO~SITE 


PDOCOOOOS 


PSOOOOG 


377- 


■>381 


CK2~PKOSPH0~SITE 


PDOCOOOOS 


PSOOOOG 


391- 


■>395 


CK2~PKOSPHO~SITE 


PDOCOOOOS 
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PS00006 


461->465 


CX2 PHOSPHO SITE 


PDOCQ00O6 


PSOOOOS 


511->S15 


CK2~PH0SPH0~SITE 


FOOC00006 


PSOOOOS 


523->527 


CK2~PH03PHO~5ITE 


PDocooooe 


PSOOOOS 


578->5B2 


CK2~PH0SPH0~SITE 


PDOC00006 


PSOOOOS 


606->610 


CK2~PH0SPH0~SITE 


pdocoooos 


FS00007 


453->460 


TYR~PH0SPK0~5ITE 


PDOC00007 


PS00007 


4S3->461 


TYR~PHOSPKO~SITE 


PDOC00007 


PS0000B 


320->326 


MYRISTYL 


PDOCOOOOE 


PSOOOOS 


324->330 


MYRISTYL 


poocooooe 


PSOOOOS 


347->3S3 


MYRISTYL 


PDOC00008 


PSOOOOS 


360->366 


MYRISTYL 


FDGCOOOOB 


PS00016 


134->137 


RGD 


PDOCQ0Q16 


P500107 


S9->82 


PROTEIN KINASE ATP 


POOC00100 


PS00107 


S9->86 


prote:;j~kinase atp 


PDOC00100 


PS0010S 


171->1B4 


PROTEI N~KI NASE~ST 


POOC00100 



P£»m for DKFZpht«»3_7j3.2 



Query 
HUM 
Query 
HMM 



Eukaryotic protein kinase domain 

•YeigRil GeGiFCtVY kCiWrTGe I va IKIIkkitas Fl RE I 

YE*++*+C*G++G+V+K+++ +C++VAIK !♦*♦+♦+ ♦♦REI 
53 Y EFLET LG KGT YGK V KKAR ES SGRL V A I KS I RKDKI KOEODLHH I RREI 

q I MRrLnH PHI IRFYDvFedddDH I YH IMS YMeGGDLFDY I r rngpMS Cw 
+ IM +LNHP+II * ++FE ♦ + I ++MEYt GDL+DYI++* + *SEt 
102 EIMSSLNHFHI IAIHEVFE-NSSKIVJ VNEYASRGDLYDYISERQQLSER 

•IrCIMyQILrGM«YLHSH9lIKROLKPEHILID«H9qIKlc0rGLARqH 
E*R++*+0I+*++ Y+H ++++HRDLK EHIL+D NG+IKI+DFGL* + ♦ 
1 5 1 EARH FFRO I VS AVH YCHQNR WHRD LKLEN I LLDANGN I K I A D FG LSN L Y 

nnYt tHt t f CGTPWYHNAPEVI Iraq . nyY 1 1 kVDMHS FGC I LWEMNTG«p 
+ ♦ TFCG*P Y *PE* **C *Y + *+V0 WS+G++L+++* G> 
201 KQGKFLQTFCGSPL YA -S PEI - VNGKPYTGPEVD5WS LGVLLY I LVHGTM 

PFyddiU1«mIi»rIiqrfrrp»pnCSeElyDFMrwCWnyDPekRPTFrQI 
PF+ + + ++ I + ♦+* +p s+ ♦ ++RW++ ++P++R T ♦*+ 
249 PFDGHOHKI LVKQISNGAYREPPKPSD-ACGLIRWLLHVNPTRRATLEDV 



298 ASHWWV 
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DKFZphCt»3_7j8 



group: testes derived 

DK.r8phte*3 7j8 encodes a novel 410 amino acid protein naarly identical to human 
WUGSC:H_DJ1 159004. 1. 

The novel protein contains an additional C-terrainal domain, which is not present in 

WJGSCiH DJU59O04.1. 

No informative BLAST results; Ho predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-spacif ic 
genet, 

WUGSC:K_DJ1159O04.1 similarity to YBL104p 

verifies and extends the oenmodel WUGSC :H_DJ1 159004 . 1 
similarity to S.certviaiae YBL104p 

Sequenced by BMTZ 

Locus: /m*p-"7p21-p22' 



1 GCAAAATATG TTGTATTTGT GGCATACTTC ATATTTACAC TATCATAAAA 

51 TTATGGCCCA CAACTTAAAT ATTCTAAATC TGTCAACATA GTTCTCTGTA 

101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 

151 TTTTCTGTTT TTAGGAATGC TGGAAAGCAG CAGACATAAT TGGAGTGGGT 

201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 

251 TTACAGCTT7 GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 

301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 

351 rGGCATTGTT CAACTTGGAT ATTCGCCGAG CAATCCAAAT CCTGAATGAA 

401 GGGGCATCTT CT6AAAAAGG AGATCTGAAT CTCAATGTGG TAGCAATGGC 

451 TTTATCGGGT TATACGGATG AGAAGAACTC CCTTTGGAGA CAAATGTCTA 

S01 GCACACTGCG ATTACAGCTA AATAACCCGT ATTTCTCTGT CATGTTTGCA 

551 TTTCTGACAA GTGAAACAGG ATCTTACGAT GGA G TTTTGT ATCAAAACAA 

601 AGTTGCAGTA CGTGACAGAG TGGCATTTGC TTGTAAATTC CTTACTGATA 

651 CTCAGTTAAA TAGATACATC GAAAAGTTGA CCAATGAAAT GAAAGAGGCT 

701 GGAAATTTGG AAGGAATTTT CCTTACAGGC CTTACTAAAG ATGGAGTGGA 

751 CTTAATGGAG AGTTATGTTG ATAGAACTGG AGATCTTCAA ACAGCAAGTT 

801 ACTGTATGTT ACAGGGTTCA CCTTTAGATG TTCTTAAAGA TGAAAGGGTT 

851 CAGTACTGGA TTGAGAATTA TAGAAATTTA TTAGATGCCT GGAGGTTTTG 

901 GCATAAACGA GCTGAATTTC ATATTCACAG GACTAACTTG GATCCCAGTT 

951 CCAAGCCTTT AGCACAAGTT TTTGTGAGTT GCAATTTCTG TGGCAAGTCA 

1001 ATCTCCTACA GCTCTTCAGC TGTGCCTCAT CAGGGCAGAG GTTTTAGTCA 

1051 GTATGGTGTG AGTGGCTCAC CAACGAAATC TAAAGTCACA AGTTGTCCTC 

1101 GCTGTCGAAA ACCACTTCCT CGATGTGCGC TnGTCTCAT TAATATGGGA 

1151 ACACCAGTTT CTAGCTGTCC TCCAGGAACC AAATCAGATG AAAAAGTGGA 

1201 CTTGAGCAAG GACAAAAAAT TACCCCAATT TAACAACTGG TTTACATGGT 

1251 GTCATAATTG CAGGCACGGT GGACATGCTG CACATATGCT TAGTTGCTTC 

1301 AGGGACCATG CAGAGTCCCC TGTGTCTGCA TGCACGTGTA AATGTATGCA 

1351 CTTGGATACA ACGGGGAATC TCGTACCTGC AGAGACTGTC CAGCCATAAA 

1401 ATGTTACCAC CTTAAGAGAA CCCTTCAAGT GTGGAGCTTT CTAGTAGGTG 

1451 TCCTTCATAG CTCAGAAACA TACCTCACAA CAAGCCATTC ATGACTTACC 

1501 TGTAATGGGA AAATAAATCA TTCTATCAGA TCAGCAGTTT TGATCTTTGA 

1551 GTGATTTTGA TATGCTTCAC AGAGACAAAT GCTGCCAAAA TAAACATCGA 

1601 AGTATAGACA TGAGTTCTCT TCAGCAGGTT GAAAAGTCTG ATTTAGAAAA 

1651 ACTTTCTAAG TTTTGGTTGA AATTATGAAC AC7CTAGAAG CAGAATTTCT 

1701 GGAAGAGCCA AGAACAGACT TTGAGCCTAT ATCTTCAAAG CTGAAACTGG 

1751 ATATCTTTCA ATAAAATATG TGCACTTTTA AAATAAAATG ACTAATTCTG 

1801 TGATTCAGAC AATAGTTTTA AGTTCAGCTG TGCTTAGATT TCTTTCAGAT 

1851 TAATTTAAAA TTATAGATTT TTACTTTTAG AATTGCAGAG CCCCTATCCC 

1901 ACACTGGAGA ATATTTTTTA TTACTGTCTG TTATATATGT GTCTATGTGT 

1951 GTGTGTATAT TTATGTGTGT ATGTATAAAT ATGTACTTTT TAAAGGAGCC 

2001 TTTTCCCTCC TTTGATTTTA AGATAAGCAA TCTTTTGGCA TAACATTATC 

2051 GTCTTCCTAG AAAACCCAAG ATGAAGAATC TATCTTACAA CTTTTTCTCT 

2101 TCACTAGAGA AAAACATGTA CCATTTCAGG TGAACATACA AAATTTTCAC 

2151 TTTCTACCTT TTGCCTTCCA ATGTCCTGAT TTGTCTTCAA AGGTTTTTCT 

2201 CCATATTAAT TTGTCATCTT ATCCTCATCA CCTGAGAACA TTTTACTGCA 

2251 TACAAAGTCT ATGCAAGATT ATATGTAACT AGCCATTTAG TATAATCTAT 

2301 GTCACTCTTT CTGTGCTGTC AAATTCCGTC CTGATTTGGA ATACCATACC 

2351 TTGTTCTTTC CAAGGTAGAC TAGGAAGTGT TGGGGAAATA GGGTCACTTC 

2401 AGAGACCATT TTAGATGTAA G I T TfTAAAT GTAAGTGTTA CTGGGGCTAA 

2451 GTCAGGGACT TTATTTAAAA CATTTTTTTT TTCTCATTTC ATAGCTAGAT 

2501 AGTTGTAAGA GAAATACAAA GAATTTACAA GATGCTTCTC TGTCATCTGC 
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2SS1 CGTATGCAGA GGGACTGAAC 

2601 AAAGAGTAAA TCTTATTTTA 

2651 AACAG C TTTC CTATTAGCAG 

2101 ATTACTTTGA CCTGTAACCT 

21 SI TACCTTGAGT CTCTCATACA 

2801 TAGTACATAT TTACTCTAAA 

2B51 GAAAGACATG CTAATTSCAA 

2901 TTTTTCCAGC CTTCATTTCA 

2951 ATATACCCTT TACCTTTAAT 

3001 CTGTCTTAAA TATCAAACTC 

3051 TTTCATTCTC ATTAGCTAAA 

3101 AACTTTTGGA AATACAGTAT 

3151 ATGCTTATTT GTAATCCTAA 

3201 GTATCTGTCA ACCTCTTAAA 

3251 AAAAAAAAAA AAAAAAAAAA 

3301 AAAAAAAAAA AAAAAAAAAA 

3351 AAA 



TAGGAATTTT GTAGTTGAAG CTCTGTTCAT 
TACATTTTGC AGAAATAAAA CAACAATTTT 
TTTTGCCTTA TAAAAACTAA' GATTTGTCAG 
AAATATTAAA AGTACATTAA ATTTATTTTT 
TAAAACCCTT TTCTAGGAAA ACATTGGAAC 
TGTCTCACCT GCATGACAGT CTTTTCAAAT 
TTTTTTTTTA AAGATTGCTA TTAAGGGTAC 
GTAAATCTTA ATTGATTTCA TTTTATTAAC 
ATTTCATTTG AAGTGTTCCT TTCAAACTTA 
AGCTTTAAGT AATGTCAGAC TCATATGCAT 
GTAAAATGTA AAATTATCTC AAATAGTTAC 
AAAACATCAA TGTAAAGTCT ATTATGTAAT 
TATATGAGGG TGACATTTTT AAGATTGTAT 
TGTTTTCTGT CAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Remit* 



Ho BLAST Etsult 



Medline entries 



Ho Medline entry 



Peptide information for frame 2 



Oftr from 161 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 



1 MVESSRHNWS GLDKQSDION LNEERILALQ LCGWI KKGTD VDVGPrLNSL 
51 VQEGEVERAA AVALFNLOIR RAIQILNEGA SSEKGDLMLH WAMALSGYT 
101 0EKNSLWREM CSTLRLQLNH PYLCVMFAFL TSETCSYOGV LYENKVAVRO 
151 RVAFACKFL5 DTOLNRYIEK LTBEMKEAGN LEG I LLTGLT K0GVDLMESY 
201 VDRTCDVQTA SYCHLOGSPL DVLKDERVOY VIEHYRHLLD AWRFWHKRAE 
251 rOIKRSKLOP SSKPLAQVrV SCNFCGKS1S Y5CSAVPHQG RCFSOYGVSG 
301 SPTKSKVTSC PGCRKPLPRC ALCLIHMGTP VSSCPGGTKS DEKVOLSKDK 
351 KLAQFNNWFT WCHNCRHGGH AGHMLSW FRO KAECPVSACT CKCHOLDTTG 
401 NLVPAETVQP 

BLAST P hit* 

No BLASTP hit* available 

Alert BLAST P hit* for DKFIphtea3_7JB, frame 2 

PIR: 345391 probable membrane protein YBL104c - yeast [Saccharomyces 
cerevislael, N - 2, Score - 4 4 6, F - 4,5e-47 

TREMBLr AC0O4982 1 gene: "HUGSC : H_DJ1 159004 . 1 '; Homo sapiens PAC clone 

DJ1159O04 tro« Ip21-p22, complete sequence., H - 1, Score - 2038, P - 
l.Se-211 



>TREWBL:AC004982 1 gene: "KUGSC : H_DJ1 159004 . 1"; Homo sapiens PAC clone 
DJi 159004 from 7p21-p2*, complete (equence. 
Length -379 

HSPs: 

Score - 2038 1305. B bltsl. Expect - 1.6e-211, P - 1.6«-211 
Identities - 379/319 (10041, Positive* - 379/319 (100*1 

Query: 1 KVES S RH JWSGLDKOS 0 1 Qtf UtEERI LALQLCGN IKKGTDVDVGP FLNS LVQEGEWERAA CO 

MVES S RHNWSG LDKQS 0 1 QNLH EERI LALQLCGWI KKGTDVDVGPFLH5 LVQEGEWERAA 
Sbjct; 1 KVESSRKKWSGLuKQS DIQMLNEER I LALOLCGW1 KKGTDVDVGPFLN5 LVQEGEWERAA 60 

Query: 61 AVALFNLDIRMIOILNE»SSEKC0LHU<VVAHAI^GYTOEICNSLWR£MCSTLRlX]LNH 120 

A V ALFN LD I RRA I Q I LHECASS EKC DLHLNV V AMA US GYTDE KM S LWREMC STL RLQLUN 
Sbjct: 61 AV ALFN LD1RKAIQ1 LHECASS EKG DLM LHW AMA LSG YT DEKN3 LWR EMC ST LRLQWH 120 
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Ouery: 121 PY LC VMFA FLTS ETGS Y DG VL Y E*tK V A VRDRV AFAC K FL SDTQLHR Y I EK LT M EMKEACN ISO 

PYLCVMrArLTSrrCSYDGVLYEHKVAVRORVArACKrLSOTQLMBVIEKLTWEMKCACM 
Sbjct: 121 PV LC VMFA FLTS ETG 5 Y DG V L Y ENK V A VRQR VA FAC K FL5 DTQLN R Y I E KLTN £MK EACH 180 

Ou«ry: 1B1 LEG I LLTGLT KDG V DLME5 Y V D RTC DVQT AS YCMLQG S PLDVLKDE R VQYW I EJI YRNLL D 240 

LEG I LLTGLTKDGVDLMES YVDRTCDVQTASYCMUOGS PLDVLKDERVOYWI EH YRHLLD 
Sbjct: 191 LEG I LLTGLTKDCVDLMES YVDRTGOVQTASYCMLOGS PLDVLKDERVQYWI EH YRNLLD 240 

Query: 241 AWRFWH K RAE FD I K US KLOP S S K PLAQV FV3CN FCG KS I S YSCSAV PH QGKG FSOYGV 3G 300 

AWRFWKKRAEroi HRS KLDPSSKPLAQVrVSCKFCCKS 1 5 YSCSAV PHQGRCrSQYCVSG 
Sbjct: 241 AWRFWHKRAEFDI HRS KLDPSSKPLAQVFVSCNFCGKS I S YSCSAVPHQGRGF5QYGVSG 300 

Query: 301 S PT KS KVT3C PGC RJC P L P RC ALC L I MHGT PV5 SC PGGT KSDEKVDLS KDKKLAQFNNW FT 360 

5PTKSKVTSCPGCRKPLPRCALCLI NHGTFVSSCPGGTKSDEKVDLS KDKKLAQFNNW FT 
SbjCC: 301 SPTKSKVTSCPGCWtPLPRCALCLI«MGTPVSSCPCCTKSOEKVOLSKOKKLAQrWNMrT 360 

Query: 361 WCHNCRHGCHACKMLSWFP 379 

WCXNCRHGGHAGHMLSWFR 
Sbjct: 361 WCKNCAHGGHAGHMlSWrR 379 



Pedant information (or Dicrzphtes3_7j8, fraae 2 



Report for DKrXphtes3_7 jfl. 2 



I LENGTH ) 4 10 

[KW) 45862.45 

[pi] 6. SI 

IKOMOLJ TRENBL:AC004 982_1 gene: "WUGSC : HJWl 1 59004 . 1 " ; Homo sapiens PAC Clone DJ1 159004 
from 7p21-p22, complete sequence. O.O 

irUNCAT] 99 unclassified protein* (S. cerevislae, YBL104c) 7e-48 

(BLOCKS) BL0002B tine finger, C2H2 type, dona in protein* 

(BLOCKS) BL00534A rercochelatase proteins 

(PIRKW) transmembrane protein 2e-46 

(KM) All Alpha 



SEO KVE S S RHNttSG L DKQSD I QNLNEER I LALQ LCCWI K KGT D V DVG P FLNS LVQEG EWERAA 

PRO cceceeccc ec c cccccc hhh hhhhhhhhhhhe cc cccccccc eccccccecc cchhhhh 

SEO AVALrNLOIRRAIOI LHEGAS SEKGD LM LNW AMAL5G YTD EKN S LWREMCST LRLQLNN 

PRO hhh hh hh hh hhhhhhhh ccc chhh hhhhhhh h hhh h hhc ccccehhhh hhh h h hhhhcc c 

SEO P YLC VM F A FLTS ETGS Y DGV L Y EN K V A V RDR VA FACKrLS DTQLN R Y I £ KLT It ENK EAGN 

PRO c cc cce e ecccc ccc ccc c eee cc chhhhh h hhhhhhhhcccchhhhhhhhhhhhhhhcc 

SEO LEGI LLTCLf KCCVBLWS YVORTGDVQTAS YCKLQGS PLDVLKDERVQYW I EWYRHLLD 

PRO cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhlihtihnhhhhhhhhhh 

SEO AwRFWHKRAEFDlHRSICLOPSSKPLAOVrvSCNrCGKSISYSCSAVPHQGRGrSOYCVSG 

PRO hhhhhhhhhhhhhhcccccccccceeeeteeccccccccccccccccccccccccccccc 

SEO S PTKSKVTSC PGCRK PL PRCALC L I NMGT P VSSCPGCT K S DEK VD LS KD KKLAO FWUWFT 

PRO ccccccccccccec cccc cee e e ecccece cccccccc c ccc e e e eh hhhhhhhhc c eee 

SEO W C K NC RKGGHAGHHLSW FRDH AEC P VS ACTC KCNQLDTTGN LVPAETVQP 

PRO e e e cccc eccech h h hhh hhhec ccccccc c c ccccccc ecce cccc cc c 



{No Prosite data available for OKriphtei3_7 jB .2) 
(Ho Pfan date available for DKPlphce»3_7JB . 2} 
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DKFXphtes3_7plO 



group: Cell Cycle 

OKFZphtesS 7pl0.1 encodes a novel 422 amino acid putative protein, which is closely related to 
Ch« Xenopus laevis XPHC2 protein. 

In fission yeast the kinases Mtel and WU1 control that initiation or mitosis starts aft«r 
completion of ONA synthesis. Ytast in which both Heel and Mikl kinases are defective exhibit a 
mitotic catastrophe ptienotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Heel /Mikl kinase function. The XFHC2 protein is localised in 
the nucleus in Xenopua oocyte*. The new protein is the human orthologue Of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 



strong sinilarity to XPMC2 protein 
complete cDKA, complete cds, EST hits 
Sequenced by BMrz 
Locus: /m*p-"9q34" 
Insert length: 2390 bp 

Poly A stretch at pos. 2341, polyedenylation signal at poi. 2318 



1 AGCCTCCCTG CTGAGGTATG CCCAACGCGT GCGGGGTCTC TTCCCGAGTC 
51 TTTTCCTCGA CGCGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 
101 AGGCGCTTGT GCTGCCAGGG CGCCGCGCCC GGGGAGGCCG GGCTCTCGGG 
151 TCCCCGCCGG CCCAGCCGCT GCACGCCACC AGGATCGCGA ACGCCAACCT 
201 CCCCCCCTCC AAGCGCCCCC CGAGCAGCCC CCTGGCTAAG CCCCCTCCTC 
251 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 
301 AGCAAGGCGC GCCAAGTAAC CAACAAGCCA CCAAGCGGCC CCGGTCCTCT 
351 GGTGCGACCT CCAAACGCAC CAGAAGACTT TTCTCAAAAC TCCAAGCCGC 
401 TGCAAGACTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAACCCTCTT 
451 GTCATCTCTC ACATGCGTTC CAAAAAGAAG CCCAAAATTA TCCACCAAAA 
501 CAAAAAAGAG ACCTCGCCTC AAGTGAACGG AGAGGAGATG CCGGCAGGAA 
551 AAGACCAGGA GGCCAGCAGG CGCTCTGTTC CTTCAGGTTC CAAGATGGAC 
SOI AGGAGGGCGC CAGTACCTCC CACCAAGGCC ACTGGAACAG AGCACAATAA 
651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTCTTCCA GAACGAGGGG 
701 ACATCGAGCA TAACAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 
751 ACCGAGGAAG ACATCTGGTT TCACCACGTC GACCCAGCCG ATATCGAACC 
■01 TGCCATACGT CCAGAGGCCC CCAAGATAGC GACGAAACAG TTGCCTCACA 
851 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCCGCGGC 
901 CTGACAAGAG CCTTAGCCTT CGACTCTGAC ATGGTGGGCG TGCGCCCTAA 
951 GGGCGAGGAG AGCATCGCCG CCCCTCTCTC CATCGTGAAC CACTATGGGA 
1001 AGTGCGTTTA TGACAAGTAC CTCAAACCAA CTGAGCCCGT GACGGACTAT 
105 1 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 
1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA CATGCTGAAG GGCAGAATTC 
1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 
1201 CCAAAAAAGA AGATTCCGGA CACACAGAAA TATAAACCTT TCAAGAGTCA 
1251 AGTAAAGAGT CGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 
1301 CGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 
1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG 6AGTGGGAGA GCATGGCCCG 
1101 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 
1451 AGCAGTCCTG CCCTGCTCCT CCTCCCGCCC CGCTACAGAG GCAATGTGAC 
1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 
1551 CTTTTCAGAA TCATGGCAGA GGGGCCTGGC GTGGTGCTAC TGAGAAGCTC 
1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 
1651 TGTTTGCGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 
1701 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TCAGGTTGGG 
1751 GCCTGTGGGC CGCCAGTCCA TACCGTGCTG TCACTGCCCA TCTTCGGTCA 
HOI CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 
1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCT? GAGGGACCCA 
1901 CACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 
1951 GTCTTACATC AGGGCTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 
2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 
2051 CCCCATAGCA CGAGGATCTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 
2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 
2151 TGTGATGCTA GCCGGTGGCT TTCTGCGCTA GGCCCCAGTT TGAGGCTCCC 
2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAC CGGACGGAGT 
2251 CCAACCCGTT ATTGGGCCAC CTGACAGCTG CACAGAAAAG GGGCAGACAC 
2301 ACCGACGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HSAC2099 Iron database EMBL: 

**• SEQUENCING IN PROGRESS *•* Genomic sequence (ron Hob 
phase 1, 2 unordered pieces. 

Score - SOSS, t - 0.0e+00, identities * 1011/1011 
S axons Bp 101219-116190 



in 9q34; HTCS 



Hedline entries 



9S157530: 

Cloning and expression of a Xenopus gene that prevent* mitotic 
catastrophe in fission yeast. 



Peptide information (or frame 1 



1 MGKAKVPASX 

SI SGPGAWRPP 

101 KIIQQNKKET 

1J1 GTEHNKKGTK 

201 PADIEAAIGF 

2S1 VGVGPKGEE5 

101 LKQGEELEW 

3 SI KFFKSOVKSa 

4 01 WESHAADRJLP 



RAPSSPVAKP 
KAPEDFSQNW 
SPQVKGEEMP 
ERTNGDIVPE 
EAAKIARKQL 
KAARV5IVHQ 
QKEVAEKLKG 
RFSLRLLSEK 
LLTAPOHCSD 



GFVXTLTRKK 
KALQEHLLKQ 
AGKDQCASRG 
RGDIEHKKRK 
GOSEGSVSIS 
YGKCVYDKYV 
RILV&KALKN 
ILGLQVOOAC 



NKKKKRFWKS 
KSQAPEKPLV 
SVPSGSKMDR 
AKEAAPAPPT 
LVKEQAFGGL 
KPTEPVTDYR 
DLKVITLDHP 
HCSIQDAQAA 



KAREVSKKPA 
ISQKCSKKKP 
RAPVPRTKAS 
EEDIWFDDVD 
TRALALDCEM 
TAV3CIRPEN 
KKKIRDTQKY 
KRLYVNVKXE 



BLAST P hits 

Ho BLA5TP hits available 

Alect BLAST? hit) tor DKriphtesJ_7pl0, tri 
No Alert 8 LAST P hits found 

Pedant information for DKFZphte*3_7plO, txi 

Report for DKF2phtes3_7pI0. 1 



(LENGTH) 
(MWI 
(pll 
(H0HOL1 
(FUNCAT) 
(FUNCAT) 
(FWNCAT) 
YGL094C] 7«-l3 
t FUNCAT] 
cerevisiae 
( FUNCAT] 
(PROSITE] 
(PROSITE! 
(PROSITE I 
(PROSITE! 
(PROSITEJ 
(PROSITEl 
(PROSITE) 

im 

1KH] 



422 

46671.91 
9.79 

PIR:SS3S1B XPNC2 protein - African clawed frog 7e-96 

03.22 cell cycle control and mitosis IS. cerevisiae, YOLOBOc] 2e-42 

01. 03. IS polynucleotide degradation IS. carevisiae, YGR276cl 2e-t9 

OS. 04 translation (initiation, elongation and termination) IS. cerevisiae, 

processing (5' -end. 3 '-end processing and urn a degradation) 1 3. 



04. 05. 05 an 
YGL094c] 7a-13 

99 unclassified proteins 
BCD 1 
HYRI 3TYL 4 
CAMPPHOSPHO SITE 2 
CK2_PHOSPHO SITE 6 
TYR_PHOSPHO~SITE 2 
CLYCOSAHINOGLYCAN 1 
FKC PHOSPHO SITE S 
All~Alpha 

LOW COMPLEXITY 11.37 \ 



(S. cerevisiae, rLR107w] 6«-10 



MG KAK VP A SKRAPSSPVAK PGPVKTLTRK KNKK KK R FWKS KAREV S KK P ASG PGA WRP P 

ccccecc ccceccc c cccec e ccc cehhhhhhhn h hh h h hhhhc cccc cccccccccc cc 

KAPEDFSONWULQEMLLKQIUO^PCKPLVISOMCSKKKPKIIOONKKETSPOVKGEEMP 

xxxxxxxxxxxx 

ccccecc chhhhhhiihhh nhhnhceccccccc cccec c ccc ee e ecccecccccccee ee 
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AGKDQ EAS RCS V PSGS KMDRRAP V PUT KA5CTEHN KKGT K ERTHG DI V P ERG D I EH K K RK 

« cc c cccccc c e ecec ccc cc cccccccecc cec c ccccccccc eee h h hh hhhhhhhhh 

AKEAAPAPPTEEOIttrODVDPAOIEAAICPCAAKtKRKOLCOSECSVSLSLVKEOArGGL 

h hhh cccccc c c e« «« ccc c cc h hhhhhc cc hh h htihh h hhe ececc h h h h hhhh h hh hh 
TRA LA LDC ENVG VG P KGEESMAA RV S I VNOY GKCVYDKYVRPTE PVT D Y RT A VSG I RPEN 
hhhcccceccccccccchtihhhhhhhccccccc«*««««*cccccccccccccccccccc 
LKQGE ELEWQK EV A CKLKGR I L VCHALH K D L K VL TLOK P KK K I RDTQK Y KP ["K5 0V KSG 

ccccchhhhhhhhhhhhhhcc«tt«ccchnhhhhhhhcccccccccc««ecccccccccc 

RPSLRLLSEKI LGK2VQQAEHCS IQDAQAAKR1.YVMVKKEWESMARDRRPLLTAPDHCSD 
chhhhhhhhhhhhhhhceccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 



Prosit* for DKnpht«»3_7pl0.1 



PS0OOO2 
PS00004 
PS00004 
PS00O0S 
PS00OO5 
PSOOOOS 
PSOOOOS 
PSOOOOS 
PSOOOOi 
PSOOOOi 

psooooi 

PSOOOOS 
PSOOOOS 
PS00OO6 
PS00O06 
PS00O06 
PS0OOO6 
P300007 
PSOOOOI 
PS0O00S 

psooooa 
psooooa 
psooooa 

PS00016 



51->S5 

i07->in 

1S6->160 
9->12 
27->30 
4t->49 
96->99 
347->3SO 
359->362 
363->366 
36B->371 
136->140 

150- >IS4 
163->167 
190-M94 
383->3B7 
413->417 
343->351 
342->351 
130->136 

151- M57 
22l->227 
239->24S 
171->174 



GL YCOS AH I KOGLYCAJ4 
CAMP PHOSPKO_SITE 
CAMp'PHOSPHO SITE 
PKC PHOSPKO SITE 
PKC - PH0SPH(TSITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO SITE 
PKC~PH03PHO~SITE 
PKC~PHOSPHO_SITE 
PKC~PHOSPHO_SITE 
PJCC~PHOSPHO SITE 
CK2~PHOSPHO~SITE 
CK2~PHOSPHO_SITE 
CK2~PHOSPHO SITE 
C K2~ PHOS PHO~S I T E 
C K 2~ PHOS PHO~3 1 T E 
CK2~PH0SPHCTSITE 
TYR~rHOSPHO_SITE 

tyr~phospho_site 

hyrTstyl 

hyristyl 

hyristyl 

hyristyl 

RGD 



PDOC00002 
PDOC00004 
PDOC00004 
PDOCOOOOJ 
PDOCOOOOS 

poocooooi 

PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOO0O1 
PDCC00007 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
PDOCOOOOS 
POOC000H 



(No Plan (Uts available tot DKFIpht«»3_7plO. 1) 
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DKr?phtes3_7p9 



9 roup: nucleic acid management 

0Krzphtes3 7p9 encodes ■ novel 691 amino acid protein with similarity to human nuclear domain 
10 protein~NDP52. 

Th« nuclear domain (NO) 10 alio described a* POD or Kr bodies is Involved in the development of 
acuta promyelocyte leukemia and virus-host interactions. Tha NDP52 protein is part of this 
complex structure. In vivo, H0PS2 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. N010 plays an important role In the viral life 

The novel protein is similar to NDP52. It contains three leucine tippers and a RGB cell 
attachment site. This protein seems to be • novel part ol the ND819} complex. 

The new protein can find application in modulation of viral infections and tumour events. 



similarity to nuclear domain 10 protein HDPS2 
complete cONA, complete cds, est hits 
Sequenced by BUM 

Locus: /map--329.1 cR from top of Chrl2 linkage group- 
Insert length: 3003 bp 

Poly A stretch at pos. 2937, no polyadenylation signal found 



ATATCTCAAG 
GTCCTGGAGT 
AAGGTCCAAT 
TCACTCGATT 
ACACATTTGT 
ATTCACACCA 
TCAGCTCTAC 
AGAGCCCCCC 
ACCCTGGAGG 
GGCAACTGTG 
ACCTGATCCA 
AGCCGAGTGC 
CACGGAGCTG 
TCACAGAAGA 
CGCATCCTGG 
GACGAAGGAA 
CTCGGGAACA 
AAGCAGCAAA 
CTTAAATTTG 
CTCAGGCTCA 
GGCCAGGCCC 
TCGAGGGGCC 
TTGGGGAGGA 
CAACTACACC 
TCAGCTCGGT 
GCGCAGGGCT 
CTGAGTGCAG 
CCAAAACCAA 
TGGTACAGTT 
CTGCGTGTGC 
ATTGCTAGAG 
ATGAGAAGTG 
GGGCTGAGCT 
AGACATGAGG 
CCTCTCCTGC 
CCGGCTCCCA 
TGACTCGGAG 
GTGGGGGTGA 
TATGACATGG 
TCGGCGCCCT 
GCTTTCCTGC 
CACTTCTTTT 
CCTCGTACAT 
GCATACACTT 
ATTCTGTTCC 
TTCTCACrTC 
TGTTCGGAAG 
CCGGGATGGA 
CCCTCGTTGC 



1 AACGTGAGGG GAACAGCTGA TCCCTCTGTT GCGAGGACAG 
51 GCCACGATGG AACAATCACC ACTAACCCGG GCACCATCCC 
101 CAACTTTCTC AATGTAGCCC GGACCTACAT CCCCAACACC 
151 CTCACTACAC CCTTCCCCC* GCCACCATCC CCAGTGCCAG 
201 CCCATCTTCA AGGTGGAGGC rCCCTGTGTT CGGGATTACC 
251 CTCCTCTTCC CTGCCTGAAA GTACAACTGA TGGTTCCCCC 
301 GTGTCCACTT CCAAGCCAGC TACCTGCCCA AACCAGGAGC 
3S1 CAGTTCCCAT ATGTCAACCG CCAGGGCCAG CTGTGTGGGC 
401 TTTCCAGTTC CGAGAGCCAA GGCCCATGGA TCAACT5GTG 
451 AGGCTGATGG GGGCTCTGAC ATCCTGCTGG TTGTCCCCAA 
S01 TTACAGAACC AGCTCGATGA GAGCCAGCAA GAACGGAATG 
SSI GCTGAACCTA CAGCTGGAGG GACAGGTGAC AGAGCTGAGG 
601 AGGAGCTCGA GAGGGCTCTG GCAACTGCCA GGCAGGAGCA 
651 ATGGAAGAGT ACAAGGGGAT TTCCCGGTCC CATGGGGAGA 
701 CAGGGACATC CTGAGCCGGC AACAGGGAGA CCATGTGGCA 
711 AGCTAGAGGA TGACATCCAG ACCATCAGTG AGAAAGTGCT 
S01 GTGGAGCTGG ACAGGCTTAG AGACACAGTG AAGCCCCTGA 
85 1 AGAGAAGCTC CTTGGGCAAC TCAAACAAGT ACAAGCAGAC 
901 GTGAGGCTGA GCTCCAAGTG GCACAACAGG AGAACCATCA 
951 GACCTGAAGG AGGCGAAGAG CTGGCAAGAG GAGCAGAGTG 
1001 GCGACTGAAA GACAAGGTGG CCCAGATGAA GGACACCCTA 
10S1 AGCACCGGGT GCCCGAGCTG GAGCCCTTGA ACGAGCAGCT 
1101 CAGCAGCTTG CAGCCTCAAG CCAGCAOAAA GCCACCCTTC 
1151 GTTGGCCAGC GCAGCAGCAG CCAGGGACCG CACCATAGCC 
1201 GCAGCCGCCT GGAACTGGCT CAACTTAACG GCAGGCTGCC 
1251 TTGCACTTGA ACCAAGAAAA ATGCCAATGG AGCAAGGAGC 
1301 GCTGCAGAGT GTGGAGGCAG AGAAGCACAA GATCCTGAAG 
13S1 AGATACTTCG ATTGGAGAAG GCAGTTCAGG AGGAGAGCAC 
HOI GTGTTCAAGA CTGAGCTGCC CCGGGAGAAG GATTCTAGCC 
HS1 CTCAGAAAGT AAGCGGGAGC TCACAGAGCT GCGGTCAGCC 
1501 TCCAGAAGGA AAAGGAGCAG TTACAGGAGG ACAAACAGCA 
1551 TACATGAGAA AGCTAGAGGC CCGCCTGGAG AAGGTCGCAG 
1(01 GAATGAGCAT GCCACCACAG AGGATCACGA GGCCGCTGTC 
1651 GCCCGGCAGC TCTGACAGAC TCAGAGGACG AGTCCCCAGA 
1701 CTCCCACCCT ATGGCCTTTC TGAGCGTGGA GACCCAGGCT 
17S1 TCGGCCTCCA GAGGCTTCTC CCCTTGTTGT CATCAGCCAG 
180 1 TTTCTCCTCA CCTCTCTGGG CCAGCTGAGG ACAGTAGCTC 
1851 GCTGAAGATG AGAAGTCAGT CCTGATGGCA GCTGTGCAGA 
1901 GGAGGCCAAC TTACTGCTTC CTCAACTGGG CACTCCCTTC 
19S1 CCAGTGGCTT TACAGTGGGT ACCCTGTCAG AAACCAGCAC 
2001 GCCACCCCCA CATGGAAGGA GTGTCCTATC TGTAAGGAGC 
2051 TGAGAGTGAC AAGGATGCCC TGGAGGACCA CATGGATGGA 
2101 TCAGCACCCA GGACCCCTTC ACCTTTGAGT GATCTTACTC 
2151 GCACAAATAC ACACTCATGC ACACACACAC TCACACACAT 
2201 AGGTTTCATC CCCATTTTCT ATCACACTGG GCTCCATGAT 
2251 CTAACAACTG CTTCTGTGTG CCCTGTTTTC ATCCCAAGAT 
2301 ATCCTCTCCT ACCTCCCTCT TTTGTCCCAG GGAGGCGTCC 
2351 CAGTGGCTGA ATTTATCCCC TGAAAGTGGT TTTGCAGGAA 
2401 GGAGGCCTTC CCCTGTGGGA ATAGAATCGT CCACTCCTAG 
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2451 TTCTGATACA CAGCCACTCC ACACACACAC TCACACTCAC ACTCCCTTCT 
2501 CTCATCCCCC AAACCCAATT CCTGGCGCAC CCTACCCTCT CTTATTTCGA 
2151 CTTTCCGTTG GTT7ACCTGA OTTTTCTCTG GGGTCTGCAC ACAGGCAGCA 
2601 GCATGGACAT CATGCCCTCT CAGGTCCCTT TTGGTTCTCA CTTTCATTGC 
2651 TTCCTCTTTC TGTTCCCCCA TTGACTTCTG TGCCCCACCC TAGCCTTTTC 
2701 CATAACCTTA GCTATTCAGT TTGCACGCGT TTTTTGTATT TTTCACCATT 
2751 CCTCTATTCT CTATCCTCTC CTCCCATCTC CTCACATGG* AAGAAATAAT 
2(01 CTATTTGTGC CTTCTCTCAC GAATGCGCGG AACAAGTGGT CCCASCTATC 
2851 CCCATTTCCA ACGCCCCCCT CCCTCTCCAG GTCCCCCCAC AGCAATAAAA 
2901 CCTTCCCCCT CATATCCATC CCTTTCTACT TTGAACAAAT ATATTTATAT 
2951 CATATCTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3001 AAA 



BLAST II* mitt 



Entry HSH93S3 fro* database EMBL: 
human ST3 HI-112&1. 
Scot* - 2191, t - 1.4*-92, identities - 463/465 



Kedline entries 



95310349: 

Molecular characteriiation of HDP52 , a novel protein of the 
nuclear domain 10, which !■ redistributed upon virus 
infection and interferon treatment. 

97375672; 

Cellular localization, expression, and structure of tha nuclaar 
dot protein 52. 



P*ptid* information for frem* 3 



ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to kno-n protein 
Prosit* motifs: RGD {551-5601 
LEUCINE ZIPPER (163-lSSt 
LEUCIME~IIPPER 1475-497 } 
LE0CINE~lIFPER (482 -504 i 



1 KEESPLSRAP SRGGVNrLHV ARTYI PNTKV ECHYTLFPGT MPSASDWICI 
SI PKVEAACVRD YHTFVWSSVP ESTTDGSPIK TSVQFQASYL PKPGAQLYQF 

101 ryvhrocqvc oosppronu: prpmoeuvtl eeadggsdil lwpkatvlq 

151 WQLDESQQER MDLM0LKLOL EGQV7ELRSR VQELERALAT ARQEHTELNE 

201 OYKCISRSHC EITEERDILS RQQGDHVARI LELEDDIQTI 5EKVLTKEVE 

251 LDRLRBTVKA LTREQCKLLS QLKEVOADXE QSEAELQVAQ QEMHHLHLOL 

301 KEAKSWQEEO SAQAQRLKDK VAQKKDTLGQ AQQRVAELEP LKEQLRGAQC 

351 LAASSQOKAT LLGEELASAA AARDRTIAEL HR5RLEVAGV MGRLAELGLH 

401 LKEEKCQMSK ERACLLQSVE AEKDKILKLS ACILRLEKAV QEERTQNQVF 

451 KTELAREKD3 3LVQLSESKR ELTELRSALR VLQKEKEQLQ EEKQELLEYM 

501 RKLEARLEKV ADEWHEDAT TEDEEAAVGL SCPAALTOSE DESFEDKRLF 

551 PYGLCERGDP G5SPAGPREA SPLVVISQPA FISPHLSGFA E0SSSDSEAE 

601 DEK3VLMAAV QSGGEEANLL LPELCSAFYD MASCFTVCTL 5ETSTGGPAT 

651 PTWKECPICK ERPPAESDKD ALEDKKDCHF rTSTQOPFTr E 

BLAST* hits 

Ho BLAST? hit* aval labia 

Alert BLASTP hit* for DKrZpht«*3_7p9, fr«a« 3 

FIK:A56733 nuclear domain 10 protein NOP52 - human, H - 2, Scor* - 307, 
P - 7.7«-2« 

TREKBL:ABOO80S2 1 gene: *BDP"; product: "NOF52"; Boa taurus nRHA for 
KDP52, coaplete'cds. , H - 2, Score - 302, P - 4*-27 

TREHBL:AC004S49 1 gam: "KUGSC:H_RG459N13. 1"/ product: "TXBP151 - ; Homo 
sapiens BAC clone RG459K13 fron 7pl5, complete sequence., H - 2. Scor* 
- 275, ? - 2.3*-25 
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FIR:G02043 TXBP1S1 - human, M - 2. Score - 270. P - e.Sa-25 

TREMBL: DM35B16 4 gane : "*ip"( product; -nonmuscle myo»in-II heavy 
chain'; Drosophila nelanogtater nonmuacl* myoain-II heavy chain (zip) 
gene, complete cd*.. H - 1, Scot* - 2S4, P - 1.4a- 17 



>PIR:AS6733 nuclear domain 10 protein NDPS2 - human 
Length - 446 

HSPa: 

SCOr« - 307 (46.1 bits). Expect • 7.7e-28, Sun P<2) • 7.7e-28 
tdentitiai - 104/323 (32*1, Positive* - 158/323 (481) 

Query: IS VN FLNV ART Y I F (TT KVECH YTIP PGTKPSA5 0W I C I rK VEAAC VR DYHT FVKS S V P E5TT 74 

v r *v * up v ckyt *r dwigif+v r*y*tf*w +*p 

Sbjct; 23 VI FlrSVEKFYI PGGDA^CKYTFTQHFIPRRKDtnGIFRVGMKTTMYYTFHWvTLPI DL*t 82 

Query: 75 DGS P I HT S VQFOAS YL P KPGAOL YO FR YVHRQGQVCGGS P P FQ FRE P RPMDELVT L EEAD 114 

♦ S VQF*A YLPK ♦ YQF TV* G V G S PFQFR D LV + 
Sbjct: 83 WR S AKQQE VQFK A Y Y L P KD D- E Y YQFC YV 0EDGWRGA3 I P FQ FR P ENE CD I L V VTTO- * 139 

Query: 131 GGSDILLVVPKATVLQN0-LOES---C0£AHDLM0LKL0LCG0VTE<LR3RVQEI,ERALA 189 

G * + K tNO L +S O+tN HQ *LQ * + E L*S ++LE + 
Sbjct: 140 GEV E EI EQH N K E LC KENQELK DSC 1 5 LQKQNS QMQAELQKKOEE LET LQS I N K KLE LK V K 199 

Query: 190 TARQE-HTELMEOYKGI SRSHGEITEEROI -LSRC^DHVARI LELEDD rQTI SEKVLTK 247 

+ TEL* 0 K ♦+ E* I * ♦ 0 ♦ E+E *Q +K T* 

Sbjct: 200 EQKD YVtET ELL-QLKEQMQKM5SENEKMG I RVDQLQAOLS TQE KEKEKL VQGDQDK - - TE 216 

Query: 248 E V E - LD RLRDT V KALTREO EKLLGQLKEVQADK EQSEAELQV AQQEN HH LHLDLKEAKSW 306 

♦ *E L ♦ 0 ♦ EO K + L+* +Q+E QQE N Dl t S 
Sbjct: 257 QLEQLJCKOI DM L FL3 LTEQRKDQKK LEQTVEQMXQttETT AKK KQQELMD EN FDLS KRLS E 316 

Ouary: 107 QEEQSAQAORUIDKVAQMKDTLGQAOQRV 315 

E OR 0 L ♦ R* 

Sbjct: 317 NEI ICNALORQKERLEGEItDLLKRENSRL 345 

Score • 304 (45.6 bitsl. Expect - 2.1e-27, Sua P{2) - 2.1a-27 
Identities - 98/337 (29*1. Positive* - 163/337 (4811 

Query: 15 VNTLNVARTY 1 PNTKVECH YTLPPGTHPSASDW I G I FKVEAACVRDVHTFVHSSVPE5TT 74 

V r +V ♦ YIP V CHYT +P DHIGIF+V R+Y+TF+W +*P 

Sbjct: 23 VirNSVEKTYI PGGBVTCHYTFTQHFIPRRKDWIGIFRVCTKTTREYYTrtWVTLPI DLN 8! 

Query: 75 DGSPtHTSVOFQASYLPKPCJkQLYOmvNROGOVCSQSFFFOrREPRFMOELVTLEEAD 134 

♦ S VQF+A YLPK *■ YQF YV+ G V G S PFQTR P +6 

Sbjct: 83 NKSAKQQEVQFKA Y YL PKDD- EYYQFCYV OEOGWRGAS I PrQFR" - PENE 130 

Query: 135 GGSDI LLWPlUkTVLQN0L0E5Q0ERNDUtQLKLQLCGQVTELRSRVQELERALATARQE 194 

DIL*V 0 ***E *0 *L + +L* L* * + + * L *QE 

SbJCt: 131 --EDILWTT- QCE VEE I EQHH KELC KENQELK DSC I S LQKQN S OHQAE LQK - KQE 182 

Query: 195 KTELMEQYKGISRSHGEITEEROILSRQQGDH-VARII.ELEDOIOTISEKVLTKEVE1.9R 253 

E ♦ ♦ I ♦+ +♦ +*Q D* +L+L++ 0 +S + + +0* 

Sbjct: 183 E LET LOS IKK KLELKV KEQKD YWETEL LQLK EQNQKH5 S CM EKMG I RVDQ 232 

Query: 254 LRUrVKALTRCQEXLL* -GQLKEVQAD- - - KEOSEAELQYAQQENKHLNLDLKEAKSWQE 30S 

1+ + *E EKL* Q K Q * KE L ♦ *Q I* + Q 

SbjCt: 233 LQA0L5TQCK EMEKL VQGDQDKTEQL EQLKKEN DH LFLS LTEQRK DQK K LEQTVEOMKQN 292 

Query: 309 EQS A - -QAQRLK DKV AQMK DT LGQAQQR VAELEPLK EQLRG AQE L 351 

E + A + Q L O* * L + + I* KE+L G +L 

Sbjct: 293 ETT AK KKQQELMDEN FDLSKRLSEHEIICNALQRQKERLEGENDL 337 

Score • 124 (18.6 bits). Expect - 2.3e*06, Sum P(2) • 2.3e-06 
Identities - 53/227 (23%), PO*itives - 113/227 (494) 

Query: 138 OILLWPKATV LQNQ LDE SQOERM DUtQLK LOLEGQVT ELRS RVQELEKALAT ARQEHT E 197 

DIL*V Q +**E +Q +L + +L* L* + L *0E E 

Sbjct: 132 DILWTT QGEV EE I EQH NKELC X ENOELKDSC I S LQKQN SDHQA ELOK - KQE ELE 185 

Ouary: 198 LKEQYKGI5R3HGEITEERDILSRQQGDK-VARILELEDDIQTISERVLTKEVELDRI.RD 256 

♦ ♦ I *♦ ** **Q 0* *L*L»* Q *S * ♦ *0»L+ 

Sbjct: 186 TLOS INK KLELKVK EQK3 YWET ELLQLKEQNQKMS S EHEKHG I RVDQLOA 235 

Query: 257 TVKALT REQEK LLGQLKEVOAO KEOS EAELQV AQQEHH H LN LDLKEA KSMQEEQ3 AOAOR 316 

+ tE EKL VQ D+t» + E H» +*EN HL L L E + 0** 

SbJCt: 236 QISTQEKEMEKL VQCOOC KTE-QLEQLKKENDHL FLS LTEORK DQKK LEQTVEQ 288 
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Query: 317 m - □ K V AQKKDT LGQAQQRV A ELEPLK EQ L RGAQE LA - AS SOQKAT LLGE 364 

+K ♦ + KK + Q+ + E L ++L ♦ * A +QK L GE 
Sbjct: 289 MKQNETTAMK KQOELMO EN FDLSK RLS EKE 1 1 C N ALQRQKERL EGE 334 

Scot* - 103 (15.5 bit*), Expect - 4.4e-04, Sum P<2) - 4.4e-04 
Identities - 63/278 (22s>, Positives - 123/278 (44%) 

Query: 299 DLKEAKSWQEEOSAQAQRLKDKVAQMK OTLGQAQQRVAELEfLKEQLRGACELAAS 354 

♦+♦£ ♦ +E ♦ 0 LKD ♦ + 0 * Q+ + ELE L ♦ * EL 
Sbjct: 141 E V EE I EQHMK E LC KENQEl K DSC I S LQKQNS DMQAELQKKQEELETL - QSIHKKLELKVK 199 

Query: 355 SOOKAT LLG E ELAS AAAAR DRT I AELHKS RLE V A EVNGRLAELC LHL KEEKCQW S KE RAG 414 

0+ EL + +E + + V ++ +L* + E+ Q 

Sbjct: 200 EQKD — YW ET ELLQL K CQNQKM5 5ENEKMG I RV DQLQAQLSTQEK EH - EKLVQG DQDKTE 256 

Query: 415 LLOS VEAEKDKI-LK LSAE 1 L- - - RLEKAVQEE RTQNQV FKTE LA REKDS SLVQLS E S KR 470 

L+ ++ E 0 + L L+ ♦ + LE+ V E+ ON* T t ++++ SKR 

Sbjct: 257 QLEQLKKEN DH L FLS LT EQRX DQKKLEQT V- EQMKQN ET - - TAMKKQQELMDEN FDLS KR 313 

Query: 471 CLTELRS ALR V LQK EKEQLQE EKQELLE YKRXJ.EARLEK VADEKWN E DATTEDEEAA S27 

L+E LQ++KE+L+ E *LL ♦+ +RL *M T DE A 

Sbjct: 314 -LSENEI ICNALQRQKERLEGEK-DI.L FRENSRLLSYMGLDFNSLPYQVPTSDEGGA 368 

Query: 528 - - - VGLSC PAALTD- S E DCS P EDKRL P P Y C LC ERGDPGS S PAG PREAS PL 573 

GL+ ♦ E SP + + +C+ u ++ PL 

Sbjct; 369 RON PGLA YGN p Y 5G I QESSS PS P L5 1 K KC P I C KADD I C DKT LEQQQMQPL 418 

Score - 64 (9.6 bits). Expect - 7.7e-2B, Sua P{2) - 7.7e-I8 
Identities - 13/29 (44%), Positives - 17/29 (58%) 

Query: 651 PTMKECPICKERFPAESDKOALEOHKDGH 679 

P CPIC ♦ FPA t+K EDH+ H 
Sbjct: 417 PLC FNC PICDK I FFA-TEKQI FEDKVFCH 444 

Scot* • 64 (9.6 bits). Expect - 5.8e+00. Sua P(2) - l-Oa+00 
Identities - 26/90 (281), Positives - 45/90 (501) 

Query: 470 RELTELRS A L R VLQK E K EQLQEE " - KQELLE YNRK LEARLE - KV ADE K - - W S15 

♦ E EL+ + LQK+ +Q E KQE LE ++ + +LE KV t+F U 
Sbjct: 154 KENQELKDSCIS LQKQNS DMQAELQKKQEELETLQS I tt KKLELKVKEQKD YHET EL LOLK 213 

Query: 51* - -NED ATTEDE EAA VG LS-CPAALTDSEDE S42 

H+ ++E+E+ + + A L* EE 
Sbjct: 214 EQNQKM5SENEKMGI RVDQLQAQLSTQEKE 243 

Score - 47 (7.1 bitsl. Expect - 4.6e-26, Sun P(2) - 4.6e-26 
Identities - 11/30 (36*), Positives - 17/30 (561) 

Query: 631 MASG fTVGT L SET STGGP AT PTW K EC P I CK 660 

♦AC + E*St P * K+CPICK 

SbjCt: 374 LAYGNPYSGIQESS5PSPLSI--KKCPICK 401 



Fedint information for 0KFZphtes3_7p9, frame 3 



Report tor DKriphtes3_7p9. 3 



I LENGTH) 691 

(MM] 71336.52 

(pll <-77 

(KOMOL1 PIR:A56733 nuclear domain 10 protein HDP52 - human 2e-29 

irUBCATJ 09.10 nuclear biogenesis IS. cerevisiae, YDR3S6w] 2e-ll 

( FUNCAT I 30.04 organization of cyeoskeleton (S. cerevisiae, YOR356w) 2e-ll 

I FUNCAT ] 08.07 vesicular transport (golgi network, etc.) IS. cerevisiae, YDL05Bw] 

2e-ll 

I FUNCAT ] 03.22 cell cycle control and mitosis |S. cerevisiae, YOR3S6w] 2e-ll 

( FUNCAT ] 30.03 organization of cytoplasm ES. cerevisiae, YDLOSSw} 2e-ll 

[ FUNCAT ] 99 unclassified proteins |S. cerevisiae, YLR309c] 2e-0B 

E FUNCAT ] 03.04 budding, cell polarity and fi lament formation (S. cerevisiae, YHR023w 

MYOl - myosin- 1 isoform] 3e-07 

E FUNCAT ] 08.22 cytoskele ton-dependent transport IS. cerevisiae, YHR02 3w MYOl - 

myosin- 1 isoform) 3e-07 

I FUNCAT 1 03.25 cytokinesis (S. cerevisiae, YHR02JW MYOl - myosin-l isoform) 3e-07 

I FUNCAT ] 09,13 biogenesis of chromosome structure [S. cerevisiae, YJL074c) 4e-07 

I FUNCAT ] 30.10 nuclear organitation [S. cerevisiae, YNL250w) 4e-06 

I FUNCAT ] 03.07 pheromone response, mating -type determination, sex-specific proteins 
[S. cerevisiee, YBR2B9wl 4e-06 
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FUN CAT] 
e-06 
FUN CAT] 

nine AT) 

rUNCAT] 

TOMCAT] 

annaschii, 

FUNCAT) 

TOMCAT] 

rUKCAT] 
FUNCAT] 
TOMCAT] 

ruKCAT) 



01.05.04 regulation of carbohydrate utili ration 



cereviaiae, YBR2*9wJ 



04.05-01.04 transcriptional control [S. Cerevisiaa, Y&R2B9v) «e-06 
03.19 recombination and dna repair [S. cerevisiaa, YNL2J0w| 4e-06 
03.13 meiosis IS. cerevUiae, YNL2S0w] 4e-06 

1 genome replication, transcription, recombination and repair [M. 
U1643) le-05 

9B classification not yet deer-cut IS. cerevislse, YJR134cj 4a-05 

11.04 dnt repair (direct repair, baa* excision repair and nucleotide excision 

ts. cereviiiae, YKR095w| 4e-0S 

08.19 cellular import [5. cerevisiae, YHL243w] 7«-0S 

01.03. 16 polynucleotide degradation (S. cerevisiae, YNX243v| 7*-05 

06.10 assembly of protein complexes [3. cerevisiae, YNL243w1 7«-05 

08.99 other intraeeilulei-trensport activitie* |S. cerevisiae, YHL079c] 

03.01 cell growth [S. cerevisiae, YKL079C] Ze-04 

BL00682B IP doeiin protein* 

3.6.1.32 Myosin ATPase 

nucleus («-I0 

phosphotransferase 2e-07 

duplication 9e-07 

cltrulline le-09 

heart 5e-ll 
endocytotls le-09 
polymorphism 3e-06 
cornified cell envelope lt-OS 
transmembrane protein 6e-12 

serine/threonine-speclfic protein kinase I* -07 

cell -all le-06 

zinc finger Se-09 

metal binding Se-09 

DNA binding le-06 

muscle contraction le-11 

lijC constant region-binding le-06 

acetylated amino end 4e-09 

act in binding le-13 

mitosis 9e-09 

microtubule binding 9e-09 

ATP le-13 

thick filament la-10 

phosphoprotein le-13 

epidermis le-06 

leucine tipper le-07 

glycoprotein 4e-07 

skeletal muscle 4e-10 

disulfide bond le-07 

calcium, binding le-09 

alternative splicing le-10 

colled coil le-13 

P-loop le-13 

heptad repeat 6e-10 

methylated amino acid le-13 

basement membrane 3e-06 

immunoglobulin receptor 2e-07 

peripheral membrane protein Se-09 

dimer le-07 . . 

cardiac muscle le-10 

extracellular matrix 3e-0C 

hydrolase le-13 

microtubule 6e-10 

stuscl* I«-09 

membrane protein 3«-06 

CF hand le-09 

cytoskeleton 6e-12 

hair le-09 

calmodulin binding 5e-09 

Golgl apparatus 3«-08 

myosin heavy chain le-13 

conserved hypothetical PI 15 protein le-OS 

hypothetical protein YJL074c Se-07 

centromere protein E 9e-09 

unai signed Ser/Thr or Tyr-specific protein kinases 2e-07 

calmodulin repeat homology le-09 

myosin motor domain homology le-13 

alpha-actinin actin-blndlng domain homology 3e-13 

tropomyosin 3e-07 

piectin 3e-13 

triehohyalin le-09 

pleckstrln repeat homology 4*-06 

rioosomal protein S10 homology 3e-13 
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[SUPrAK) gi*ntin U-Ot 

(SUPFAK) piot«in kin«it homology 2«-07 

[SUPFAK] prot«in kln*ie C line-binding ropcat homology U-06 

[SUPFAM1 involucrin U-06 

(SUPrAKj ktnaiin motor douin homology 9«-09 

ISUPFAK] huun etcly *ndoioa« *ntlg«n 1 5«-09 

[SUPrAK) un*>sign*d ki.n*jin-r»l»t«d proteins l«-01 

[SUPrAHl HS protaln }«-03 

[SUPrAM) cytoikalattl keratin 3«-0l 

(PROSITE) LEUCINE UPPER 3 

[PROSITE] RGD "i 

(PROSITE] KYRISTYL * 

[PROSITE) CK2 PH0SPM0 SITE 2S 

[PROSITE] PKC~PHOSPMO"SITE 6 

IKH) All~Alpna 

i KW f LOM~COMPLEXITY 9.12 » 

[ICW1 C0:LED_COIL 39.36 I 

SCO MEES F LS RA PS RGCVN ttW ART Y t PMT K V ECH YT LP PGTMPS ASM! C I FKVEAAC VRD 

SEC 

PRO cc e cceecc cee e c • e aacc a ••• ececcce « • e ace ccc eecccc e e e • e • ••• e cccc 
COILS 

SEQ YOTfV«SSVPESTrOGSPIHT$V0FQA3YLPI(PGA0LY0rRVV»RQCOyCCOSPPrQF«E 

SEC 

PP.0 •••»»«•« ccccc ccccchh h hhivhhh hh hccccccc a a* ccccccc ccccccccccc 

COILS 

SEQ F R FNDEL VT LE EA DGCS DILLVVPKATV LQNQL DESOOE RNDLMQLK LQ LEGQVT ELRS R 



VQC L ERALAT ARQEHTELMEQYKG I S R3HGC I TEXRDI LSROQGDKVARI LELEODIOTI 



S EK VLTK E VELDRLRDTV KALTREQEK LLGQ LK E VQADKEQS EAE WJV AQOEN H HLN LDL 
hhh h hhhhhhhh h hhhhhh hhhhhhhhhhhh hhh hhhhhhhhhh hhhhhh hhhhhhhhtift 

KEA KSWQEEQSAQAQMJCDKV AQMK DTLGQAQGRV AELE PLKCQLACAOELAAS SOQKAT 



(TIAELHRS RLE V AEVHGRLAELG LH LK EEKCQWS K E RAGLIQS V E 



AEK OK I LKLS AEI LRL C K A VQEEKTONOV FKTE LAREKDS S L VOLS CS K RE LT ELRS ALR 



V LQK EKEOLOECKOEL L E YMRKLEARLEK V AOEKWHEDATT E DE EAAVCLSC P AALTOSE 
hhhhhhhhhhhhhhhhhhhhhhhhhhhnhhhhhh hhhhnhhnhhhhhh hhhhhhhhhhhh 

0 ES P ECHRL P P YG LC ERGD PGS S F AGPREA S PL W I SOP A P 1 3 PH LSG P AEDS S S DS EAE 
hhhhccc cceccccecc c cec c cccc cc c cc a t a • • accccccceccec cc ecce c echh 

OEKSVLMAAVQSGGEEAIfL U.PELGS ArY DKASCITVCTLSETSTCG PATPTMKEC PICK 



[FTPAES DX DALEDHKDCH FITST QO F IT FE 
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Pro*tt* (or DKFZphti»3_7p9.3 



PSOOOOS 


190- 


■>193 


PKC PH03PH0 SITE 


PSOOOOS 


HI- 


•>244 


PKC~ PHOSPHO SITE 


PSOOOOS 


:s7- 


■>260 


PKC~PHOSPH0~SITE 


PSOOOOS 


468- 


•>47l 


PKC PHOSPHO SITE 


PSOOOOS 


SS2- 


■>6SS 


rKC~PHOSPHO~SITE 


PSOOOOS 


S67- 


■><70 


PKC _ PHOSPHO"SITE 


PSOOOOS 


28->32 


CK2~PHOSPHO SITE 


PSOOOOS 


o->o 


CK2~PHOSPH0~SITE 


PSOOOOS 


Sf 


l->72 


CK2 PHOSPHO SITE 


PSOOOOS 


72->76 


CK2 KHOSPHO SITE 


PS00006 


129- 


■>133 


CK2~FHOSPH0 SITE 


PS00006 


lit- 


->1(0 


CK2~PH0SPH0~SITE 


PSOOOOS 


2oa- 


■>212 


CK2~PHOSPHO"SITE 


PSOOOOS 


239- 


■>243 


CK2~PH0SPH0 SITE 


PS00006 


282- 


>28« 


CK2~FH0SPHO~SIT£ 


F300006 


SOS- 


>309 


CK2~FM0SPHO SITE 


PSOOOOS 


IT S- 


>380 


CK2~PH0SPHO SITE 


PSOOOOS 


38 3- 


•>387 


CK2~PH0SPH0 SITE 


PSOOOOS 


461- 


■>472 


CK2~PH0SFH0"SITE 


PSOOOOS 


520- 


>524 


CK2"PH0SPH0"SITE 


PSOOOOS 


537- 


•>S41 


CK2~PH0SPH0~SITE 


PSOOOOS 


539- 


■>S43 


CK2 PKOSPHO'SITE 


PSOOOOS 


54 3- 


■>547 


CK2 PHOSPHO SITE 


PSOOOOS 


593- 


•>597 


CK2~FHOSFH0 SITE 


PSOOOOS 


595- 


■>599 


CK2~FH0SPH0 SITE 


PSOOOOS 


597- 


•>601 


CK2~FH0SPH0 SITE 


PSOOOOS 


S12- 


>61S 


CK2~PH0SPH0 SITE 


PSOOOOS 


S39- 


•>643 


CK2~PH0SPHQ SITE 


PSOOOOS 


652- 


■>€5S 


CK2~PHOSPK0~SITE 


PSOOOOS 


SS7- 


■>«71 


CK2 PHOSPHO SITE 


PSOOOOS 


S8J- 


■>«87 


CK2~PHOSPHO SITE 


PSOOOOt 


39->4S 


MYRISTYt 


psooooi 


107- 


•>113 


KYR1STYL 


PSOOOOI 


204- 


•>210 


MYRISTYL 


P30000B 


414- 


■>420 


HYRISTYL 


PSOOOOt 


561- 


■>567 


MYRI5TYL 


PSOOOOS 


613- 


■>619 


KYPiI STYL 


PS0001S 


557- 


■>S60 


KGO 


PS00029 


1*3- 


•>18S 


LEUCINE UPPER 


PS00029 


475- 


>497 


LEUCINE ZIPPER 


PS00029 


482- 


■>504 


LEUCINE ZIPPER 



POOCOOOOS 
POOCOOOOi 
POOCOOOOS 
PDOC00005 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOC000O6 
PDOCC0006 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
POOCOOOOS 
PDOC00008 
FDOC00008 
PDOCOOOOI 
POOCOOOOB 
FDOCOOOIS 
FDOCD0029 
PDOC00029 
P0OC00029 



(Mo Pfan d*t* *v*il*blt for OKFlpht«»3_7p9. 3) 
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group: aifnal transduction 

DKrtphcial l«2<.3 ancodaa a novtl (5< aolno acid putatlv* OTP-binding protitn, ralatad to 
yaait YGL05»- and «ouae KM*1 putaciv* GTP-blndine protalna. 

CTF-blndlna pro t aim ara tnvolvad in varioui tignal tramdwtlon pathway*, trans (acting tha 
aignal of a callular raeactor to an intracellular lional ca*cada. 

Tha naw protain can (ind clinical application In »odu la ting/block inc. tha ra»fxm« to a 
callular rtcaptor. 



atrong ainilailty to guanina nuclaotlda binding protaim 
cca^lata COHA, complata cda, potantial atart at Bp 31. EST hita 
Saquancad by nadiCanoaiit 

Intart langth: 3290 bp 

Poly A atratch at po». 32(9. polyadenylatlon lignal at poi. 3111 



1 CGTCCACCCG TCGTGTTGCC ATCCCCCCGA GGAGAGCCCC CCCCCCTGGG 

SI TCCCTCGCAC CGCCCCTTAT CCCCCATCAC ACTCAGCCCA GCCGAAGCCA 

101 TCCTCACACT CACTCCTCCT TGCACACAAG TGAACTCAAT GATCGCTATC 

111 ATTCCCCTCG TCTTAATCTT CAGTCAGTGA CTGAACACAG CTCCCTTCAT 

101 CACTTCCTTC CTACTGCACA ACTTGCAGGA ACACACTTTG TACCTCAAAA 

211 ACTTAATATt AACTTTCTGC CTGCTCACCC TAOAACTCCA CTACTGTCTT 

301 TCGAGGACAG CCACACAATT AACAACCTCC ATGAAGAAAA CAAACAGTTC 

351 TTCTGTATAC CGAGGAGACC AAACTCCAAC CAAAATACTA CCCCACAACA 

401 ACTCAAACAA GCACAGAAAG ATAACTTTCT AGAATCCAGA CGTCAGCTTG 

«S1 TCCGGCTAGA AGAGCAACAG AACCTCATAT TGACTCCATT TGAACGAAAT 

S01 TTCGACTTTT CCCOCCACCT CTGGACACTC ATTCAGAGAA CTCATATTCT 

SSI GGTCCAGATA CTACATCCTC CAAACCCACT CCTCTTTACA TGTGAGGATT 

601 TCCAATCTTA TCTCAAAGAA ATCCATCCCA ATAAGGACAA CCTCATTCTG 

(51 ATCAACAAGG CAGACTTOCT CACTCCTCAG CACCCCAGTC CCTGGGCCAT 

701 CTACTTCCAA AAAGAACATC TCAAGGTTAT TTTCTCCTCA GCTTTGGCCC 

7 SI CACCCATTCC CCTGAATSGT GACTCTCAGG AAGACGCAAA CAGAGATGAT 

101 AGACAAAGCA ACACAACTCA GTTTGGACAT TCCACTTTCG ACCAGCCTCA 

111 AATTTCCCAC AGTCAATCCG AACATCTCCC AGCTAGGCAT TCTCCTTCAC 

901 TTAGTCAAAA TCCCACAACC CATGAAGATG ACAGTGAGTA TCAGGACTCT 

9S1 CCAGAGGAGG AGCAAGACGA CTCGCAGACG TGCTCAGAAG AACACCGTCC 

1001 CAAGCAAGAG CACTCCAGCC ACCACTCCAA GGAAAGCTCT ACTCCACATT 

10S1 CTCACCCTCG GAGCAGCAAA ACCCCACAGA AGAGCCACAT ACACAATTTT 

1101 AGCCATCTCG TATCCAAGCA CCAGTTACTG CACCTCTTTA AGCAGCTACA 

USl CACTCCCACA AAGCTCAAAC ATCCCCAACT TACCCTCGGA CTCCTGGGCT 

1201 ACCCTAATCT TGCTAACAGT TCAACAATCA ACACCATCAT CGGCAACAAC 

12S1 AAAGTATCTG TGTCTCCCAC ACCTGCTCAC ACAAAGCACT TTCAGACTCT 

1301 CTATCTCCAG CCTGGCCTCT CCCTGTCTGA CTCTCCTGCC TTGCTGATGC 

1311 CATCTTTTCT CTCTACCAAC GCACAAATGA CTTCCAGCGG AATCCTCCCA 

1101 ATTCATCAGA TGAGAGATCA TCTtCCTCCT GTATCACTAG TTTGCCAGAA 

Hi! TATTCCAAGA CATCTTTTAG AAGCTACCTA TOGCATTAAC ATCATAACCC 

1101 CTACACAGGA TGAAGATCCC CACCCACCTC CAACATCGGA ACAACTGTTC 

1151 ACAGCTTATC CATACATGCG ACCATTCATG ACACCGCATC CACAGCCAGA 

1(01 CCAGCCTCCA TCTGCGCCCT ACATCCTGAA GCACTATCTC AGTGGTAAGC 

1S11 TCCTGTACTG CCATCCTCCT CCTCGAAGAG ATCCTCTAAC 1 1 1 1CAGCAT 

1701 CAACACCAGC CACTCCTACA GAACAAAATC AACAGTGATC AAATAAAAAT 

1711 GCAGCTACCC AGAAATAAAA AACCAAACCA GATTGAAAAT ATCCTTGACA 

1(01 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGCAGTCCAG 

US1 GCTGTCATGG CTTACAACCC CGGGACTCCT CTACTCACTG CATCCACTGC 

1901 GAGCTCTGAG AACCCGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 

1911 ATAAAAAACA AAAAAGTCGT AGACTCTACA AGCACCTCCA TATGTCACCt 

2001 TGCCCTGCAA CAGAAATCTC ATCTGCATTG TCCACATCGA AAAGAGCAGA 

2051 AGCTGCCTCT TCCCTCTCCA ACTCTCCCAA CACACTACCA CTGTAGAACG 

1101 GGCCCTCCTC TTCCACACCA CGCCTCCACC CAACACTCTC CATCTCAAGA 

2111 CCAACGCCCT CCTGGAAACA CCACCTCTGA CAAAAAGGAG TCATCTCCCA 

2201 CCCCGAGAAT CCTACTCCTG CCCGGGCACA GTGGCTCACC CACCAACATG 

2251 CAGAAACCCC CTCTCTACTA AAAATACAAA AAAATTACCC ACGCGTGGTG 

2J01 GCGCCCACCT GTAATCCCAG CTACTCGGGA CGCTCAGGCA CGAGAATCAC 

2311 TTGAACCACG CAGGCAGAGT TTCCAC7XAA TCGACATTGC GCCGCTGCAC 

2401 TCCACCCTGG GCGACACACT GAGACTGCAT CACAAGAAAA AAAATTTSCA 

2<51 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 

2 S01 CATTCTGAGT CTCCTAGTTG GCTTCCTCCG ACTCTAAACA AGGCACTTGC 

2S11 GTTCACTTAG TGTACACCGG GGCCTCACCT CCACTAACGA ACATCTAGAA 

2 £01 TCTAACCACC CCGTGACAGG CAMXTGCGG TATTTACTAC CTAGCCCCCA 
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26S1 TCTTCACTGG TTATTCCACT TATTTAAAAT CTCCACAATA ASCAAATCTC 
2701 CATATAGAGG AAGTAGATTA GTGGTTGCTT CGGGATGGGA GGAATGCGAA 
27S1 GATTCAGCTC TTTCTTTTGC AGTGATAAAA ATGTCCTAAA ATTGACTCTA 
2(01 CCCATGGTCA CACAACTCTC AATATGCTTA AGACCATTGA ATTACACACT 
2851 TTACGTTGGT CAATTCTATC GTATGTAAAT TATACTTCAA TAACATAGTT 
2»01 ACAAAAGATA ATCAAAAGCA TGAAAGCACT ATTGATCTGG TTTGGATCTG 
I»S1 TCTCCTCACC GAGTCTCATC TTGAAATGTA ACCCCCCTCS TCCCACCCCA 
1001 TGCCATTATG GGGCAGAGTC CTCACAAACG GTTTAGCACC ACCCCCTCAC 
30M TGCTCTTCTC CTGATATTCA CTCCTCATCA CATCTCCTTC CTTCAAAGTC 
3101 TGTGGTGCCT CCCCTCTCTC TCCCTCCTGC TCTCCCCATA TAAGATGTGC 
3151 CTGCTTCTCC TTCGCCTTCT AACATGATTG TAACTTTCCT GAGGCCTCCC 
3201 TAGAAGCAAA AGCTGCTGTO CTTCCTGTAC CATCTACTGG ACCCTCAGCC 
3211 AATTAAACCT CTTTTCTTTA TAAAAAAAAA AAAAAAAAGG 



Ho Mdllnt entry 

Paptlda information foi 



1 HGRRPAPAGC SLGHALKRHO TORSRSKIWT DSWLHTSCLH OGYDKCXLML 
SI QSVTEQSSLO DTLATACLAG TEFVAEKLUt KFVPACARTG IXSITESQM 
101 RKLHICWKOr LClfWMWN ONTTPEELKQ AEKWUXWR RQLVKLEEEQ 
1S1 KLILTFFERM LDWnQLttAV IERSOIWOI VOAANPLLFR CEDLECYVKE 
201 MDANKEMVIL IHKAOLLTAE QRSAWAMYrE KEDVKVIFWS ALAGAIPLNG 
2S1 OSEEEANRDD ROSNTTErCH SSEDGAEISH SESEHLPARO SPSLSENPTT 
301 DEDDSEYEDC PEEEEMWOT CSEEDCPKEE DCSQCWKESS TADSEAMP.K 
3S1 TPOKTOIHHr SHLVSKQELL ELFKELKTGK KVKDGOLTVG LVCYFNVGM 
401 STIKTIMCKK KVSVSATPCH TKHFCTLYve PCLCLCOCPG LVMPSPV5TK 
«S1 AEMTCSGILP IDOKKOKVPF VSLVCOMIPK KVLEATYCIH IITPREOEOP 
501 HftPPTSEELL TAYGTMRCm TAHCOPDOPR SAHYILKOTV SGKLLYCHPP 
111 PCROPVTrOH OHORLLEMKK tiSOEI KMQLG RNKKAKOIEK IVDXTrFHQE 
(01 NVKALTKGVQ AVMGYKPGSG WTASTA53E MGAGKPMKKH GHRtlKKCKSR 
631 IU.YKHL0H 

BLASTP hit! 

No BLASTP hit* *v*i libit 

Altrt BLASTP hits (or DKFlphtti3_Be24, Hum 3 



>SWlS3FRCT:YAKG_SCKPO HYPOTHETICAL GTP-BIHDIHG PROTEIM C3F10.16C I 
CKK0MOSOHE I . 

L«nqtn ■ 616 
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Sbjct 
Outry: 
Sbjct 
Qutry 
Sbjct 
Outry i 
Sbjct 
Outry: 
Sbjct 



12 LCIWI0SOrrKHRIW(RI(--«;L((HIVDS0PKAH--l<A*LRSVTHETnLbEFLMT«LCCV tl 

12 ErVACKLNIKrvP-XtARTCLLSrEtSOPIKKLHEDJKQrUriPRRPNWMOKTTPCELKO 1J0 

EF*AEK M* * E LLS EE* R K* E+«K t IPRRP*W *Q TT EL ♦ 

6S En AEKQNVTV I ON PEON P FLLS KEEAA RSKQKQEKN K QALTI P RRF HWDQTTT AVELDR 1J1 

131 AEKOM rLEVRRQL VRLE E EQKL I LT PFERNLMVROLWRV I ERS D I WQI VDARN F LLrR 190 

E***FI, WW L *L* + ♦ I*TPFERNL* NROLOTVIERSD+WOI VDARNFl r* 

12t !«RESrunnUO«LA0l.OOVECfIVTPreR«LEIWR0«RVIi^C(VVV()IvnARMFI.frR in 

191 C £0 LEG Y V K EMDAN KEN V 1 LI N KA DLLT AEORS AW AMY FE KEDV K V I FWS A LAGA I PLKG 2 SO 

LE YVKE* + K+N +L+-NKAP+LT EQR+ «♦ YT • ♦ • *r+SA A N 

IBS SAHLEOYVKEVCPSKKHrLLVNfADHLTEEOW'V*ISSYI>tENMIFFUrFaARMAA-EAKE 24 1 



JIT RCEOLETYESTS5H 260 



Query: 
Sbjct 
Outry: 
Sbjct 

Sbjct 
Outry 
Sbjct 
Qutry 
Sbjct 
Query: 
Sbjct 



0 STAKEARSRKTPOKRQIHNrSHLVSMELLELFKELMTCRKVKDCQ-- LTVGLVGYPNV J97 
ST* +E * *H* S + * * L *F+* ♦ * DC* *T CLVGYPHV 

6 STSSNCIP ESLQADEN DV H S - S R I ATLKVLEC I FEKFAS - -T L PDG KT KMT FCL VG Y PN V 312 

• GKSST I NT I KCH K K VS VS ATPCHTKH FOT LY V E PGLC LC DCFG L VMP3 FVST KAEKTC SG 4S7 

CKSSTIH +*G*KKVSVS*TPG TKHrQT* + ♦ L DC PC I. V PSF *T+A+» G 
3 CK5S T I N ALVCSK K VS VS STPG KTKK FOT IH LSEXV3 LLDCPG LV TPS FATTQADL V LOG 372 

t ILPtDOMRnHVPPVSLVCQXIPRMVLEATYGIIII-ITPREDEDPIIRPPTSEtLLTAYGYll Sit 
•LPIDO+R** P + L* ♦ IP* VLE Y I I I P E E P+++E+L * 

3 VLPI DQLREYTGPSALKAERI PFEVLETLYTIRI RIK PI E-EGGTGVPSAQEVLFPFAR3 431 

7 IWFmAH-G0PI>OPIWARYILRDYVSGr.LLYCHPPFG--ROPVTFt>«OHQRLLEWW1|ISD J73 
RCm AH G PO R+AR *WOYV*CKLLY HPFF r +H ♦ + ♦ SD 

2 RGrMRAKHGTFDOSRAARILLKDYVNGKLLrVHPPPNYFKSGSErKKEHMQltlVSA-TSD 490 

4 EIKMOLGR---"KKAltOIEM-IVOI(TFF710Eli--VRALTKCVOAVN-G"YltPGSGWTA t24 

I *L R ♦ E* +VD *F OCN V* ♦ KG M G YK + + 

1 S 1 TEKLORT A I S DNTLS A ESQL VDDEY F -QEN PH VRPKV KGT A VAMQGPV YKGRNTttQ P F 14) 

5 STASSEMGAGK-PVKKHGMRMRKEKSRRL CS2 

*+*■ * K P C + E+R+L 
0 0RRLDDDASPKYPMAQGKPL3RRKARQL 578 



Qutry: 
sbjct 



SbjCt: 1C 
Scar* • 43 



Sbjct: 



2 GRBPVTFX^IOKQRLLENKMNSOEIKMOLGPJIKRAKOIEKIVDKTFFHOEHVRALTKGVQA til 
GOT** * ♦ *t)E » R K ft I *K T TK 

5 GEDLETYESTSSHElPCSLOAOEHDVHSSRIATLKVLeGirER—FASTLPDGKTKKTrG 301 

2 VHGYKPGSGWTASTASSEHGAGK OS 
**GY P G *ST ** C* K 

6 LVCY-PMVG— KSSTIMALVGSKK 326 



t KKHGWRNKKEKSR 6S0 
I KKHNKKNKRSKQR 60S 



information for OKFIphtt»J_(«24, few 



Report for OKFlphtt i3_»*2« . 3 



(KHI 

Ipll 

[HOMOL1 

I. S«-5S 

IFVNCAT) 

[FWCATI 

[FVDCATJ 

{PIRKH} 

[P1RKH} 

(5UPFAMJ 



7S226.SB 
S.St 

SW1SSPR0T:YAWG_SCHP0 HYPOTHETICAL GTP-BINDING PROTEIN C3T10.16C IH 

99 uncl***tfl*a protiln* IS. cattviiUc. YGL099vJ 3«-SS 

r gtnvnl function prediction (M. j*nn*»chii, HJ1444] lt-l( 

08.16 ■ntr*ctllul*i transport (S. Ctttviiitt, YERQOtwl 3t-09 

P-Ioop lt-2 1 

OTP binding U-27 

conserved hypothetical protein HG442 7t-0t 
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ATP GTP * 1 
MYMSTYL J 
AMI DAT I OM 2 
CAHP_PWSPHO SITE 1 
CK2 PHOSPHO SITE 19 
TriTpHOSPHO'lITE I 
PKC~PHOSPHO~SITE 10 
ASN~G L rcos YLATION 2 
Mpjti_Xti 

LOW COMPLEXITY 4.56 » 



MGRRRAP AGGS LGRALMHHQTO.KS RS HRHT DS* LHT S ELN DGY DWG RLNLOS VT EQ55 LD 
cc eceecccccchWiJitidhhh h hecccccccccccccecec eeccehhhhhhhhccccch 
OrLATAELACTEFVAEKLMIKrVFAEAItTGLLSrUSQRl KKLHEEKKOrLCI PRXPNWN 

WrrPEEUtOA£KOKrun(IUtOLVltLBEEOKl,ILTPrEWILDn(ROLWP.v:eMOIVVOI 
CCCChh (1 hhh h hhhhhh hh hhhhhhhhh hhhh h hhccchhhfifihhh Witih h hhcc«»«* • 
VOARK P LLFRC EDLEC Y VK EMDAH KENV I L I N KAOLLT AEOB1 AWAMY I'M EDV K VI FWS 
•ccecccccc hWihhhhhhhhccccc • • • t e eccc tihhhhhhhhhhhhhhhccc* t t«c 
ALAGAI PLHGDS EEEAMKTO AQSHTTE rCMS S fTJQAE ISHSESEHL P*W>S M LSEH PTT 



DtDDSEttDCftEtLDOWaTCSEEDCrKCECXrSOWKESSTADStAMWTyOKBalHNr 



SHLVSKOCLLELFKEUrrGRKVKDGOLTVGL VGYPHVGX55T r NT1 HGHKKVSVSATPGH 
ccccchhhhhhhhhhhhhhheccce««««««cccccccccc««»«ccccc«»»«*ccccc 
T K H fQT LY V EPGLCLC DC PGL VM P3 FVS TKAEMTCSG I LP I OQHROHV PFVSL VCQJH X PR 
cc 1 1 1 • ««ccc t • •ceeccccccccchhh hhhhhccccccccccccccc* •••cccccti 
H VLEATYGI N I ITPREDEDPHRPPTSEEIATAYCYKRGrKTAHGOPOQP RSARYI LKDYV 
Wihh hh hhececccccccccc ccechhh hh h hhh hhhhhcccccccccchn (itihhhtihcc 
SCKLLYCHPPPGRCPWrOHOHORLLENWOlSOEIKHQLGRMKKAKOIEMIVDItTrntQE 
c ce •« • * ccc cccccccc hhti h Wihfthhcccchhhhhhhhcc tahhhhhhh hhhhc c eeeh 
MVRALTKGVQA VHGYK PCSGWTAST A3SEHGAGK PVKKHGN KHK K EK S RRLY KH LOrt 
hhhhhhhe ••••cccccc«*«ccc ccccececccccccc cccchhhhhhhhhtic ec 



I £ot 0Krtpht*i3_B*24. J 



44I-XS1 
49}->49« 
S31-»34 
S41->144 
C49-X52 
J2->56 
J7->61 
»3->97 
123->121 
1SJ->1J9 
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DKrzphtt>3_igii 
group: tt»ta* darlvad 

0Knphtei3_loll ancodai a noval prolint-rieh 939 amino acid protain without ainilarity to 
known protaina. 

(•-loop) . 
ir-SCOf notita. 

Th* n«v protein c»n find application in atudying tha *«pr«»»ion prom* 01 taatit-iptcilic 

unknown, pioiin rtteh prottin 
t C3T hit (froa> t«*ti* library! 

Int*tt length: 3100 bp 

Poly A atratch at po*. 103*, polyadanylation aignal at pot. 3041 

1 AGAGTCTTCC CTCAGCATAT TTTACGATAG ACAAGATCTT GTTCCAATGG 
SI AACAAAGTCA GGACTCACAC ACTGATTCCC AGACAACCAT TTCTGAGTCC 

101 CAACACTCCC TCAACCCAAA tTATCTTTCC CAGGCCAAGA CTCACTTCTC 

151 AGAACACTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA CCACCAAAAC 

301 TCTTAACCAG TCAAATACCC CCCGATCTCC CTCCACCTCT AGCTTCAGCT 

2S1 CTACTCCTAA AATACCCTAT CTCCCTACAC TCTGGCCGAT GTTCACCACT 

301 TAATTGCCAT CAT AAA TT AC ACACCACTTC GG&GCCTTAT CTTCTTATCT 

351 ATCCACACCT CCACCTTGTA CGCACTCCTC AAGGCCATGG TGAGGTTCGC 

«01 TTGCATCTTG CCTTTACCCT CAGAATTGGG AAAAGATCCC AAATCTCAAA 

451 GTATCGTGAA ACAGATACAC CCCTCATACG GAGAAGCCCT ATATCACCAT 

S01 CACAAAGGAA ACCTAAAATC TATACTCAAG CTTCCAACAG TCCTACTTCQ. 

5J1 ACAATAGATT TGCAGTCTCG CCCTTCCCAG TCCCCTGCTC CTGTACAAGT 

«01 CTACATCAGG CCAGCACAAC GCAGCAGGCC TCACTTAGTA GAAAACACAA 

«S1 AAACTAGAGC ACCTGGGCAC TATCAATTCA CTCAACTTCA CAACCTACCA 

701 GAGAGTGACT CTGAAAGCAC TCACAATCAA AAACGGGCTA AAGTCACAAC 

751 CAAAAAGACC TCTGATTCAA AATATCCAAT GAACAGAATC ACCAAGCCAC 

B01 TTAGAAAACA CACAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 

(SI CCTTCTAGGG AATTAGCACC CCATTTAAGA AGGAAGAGCA TTGGAGCAAC 

901 TCAGACAACT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 

9SI CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAACCCGGC ATTCCAAACA 
1001 CCACACAGAC TTATAGCTTC TGTTGGGCGG AAGCCTCTCG ACGGGACAAG 
10SI GCCAGACAAT TTCTCCCCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 
1101 CGGACTATTC CTTACCAAGC AGTATCAAAA CAGACAAGAG CTCAGCTCAC 
11S1 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTCTCCCC 
1201 ACCAACGGTC CACTCCACAT CACCTCAACA GCCAACAACA GCTTACTCTT 
1251 TCCAACCCAG ACCTCTTCCA CTCCCCAAGC CCACAGATTC CCAAAGTCCT 
1301 ATTGCTTTCC AAACTGCCTC ACTGGGGCAG CCTCTGAGAA CTCTTCAAAA 
13S1 GGACAGTAGT AGCACATCAA AGAAAAACTT CTATAGAAAT GAAACCTCCA 
1401 GCCAGGAGTC TAACAACTTG TCCACACCAG CAACCAGAGT TCAGGCCCGA 
1451 CGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 
1S01 TAAACACAAA CTCACACACA AGCAGCATAA CCACCCCAGC TTCTATAGGG 
155 1 AGAGAACCCC ACGCGGTCCT TCTCACAGAA CCCGTCATAA CCCCTCTTCG 
1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACCCAGTT CCTTCCACAC 
1«S1 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 
1701 ACCATTCCAG TCCTTCTGAG AGAACCTGGC GCAGTCCCTC TCACAGAAAT 
17S1 CACTGCACTC CCCCCGAGAG GAGCTCTCAC ACTCTCTCTC AAAGCGGCCT 
1S01 TCACAGTCCC TCTCAGACCA CCCATCGCGG TCCCTCTCAC ACAACACATC 
US1 ACAGTCCCTC ACAGAGAAGC CATCCCACTC CCTCAGAGAG AAGCCATCGC 
1901 AGTCCCTCTC AGAGAAGACA TCGCAGTCCC TCCCACAGGA GCCATCGCGC 
1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCCCACTC 
1001 CCTCTCAGAG GAGCCATCGT CCTCCCTCTG AGAGAAGACA TCACAGTCCC 
2051 TCTAAGAGA* GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 
2101 AGAGAGAAGC CATCACACTC CCTCTGACAG AAGCCATCAC AGTCCCTCTC 
2151 AGAGAAGACA TCACAGTCCC TCTGAGACAA GCCATTGCAG TCCCTCTCAC 
2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCCCACTC CCTCTCAGAG 
II SI AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTCACAGAA 
2301 GCCATCACAG TCCCTCTCAC AGAAGACGTC ACACTCCCTT GGAGACGAGC 
23S1 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGACGAGATC 
2*01 TCACACGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 
24 SI GTCCCTCAGA CAAGAGCCAC CTCAGTCCCT TGGAAACAAG CCGTTGCAGT 
2501 CCCTCTCAGA CGAGAGGACA CAGTTCCTCT GGGAAAACCT CTCACACTCC 
2351 CTCTGAGAGA ACCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 
2601 CTGACACCAG CCATCGCAGT TCCTCTCAGA GAACCCCTCA CAGTCCCTCT 
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2611 CACATGAGGC CAGGGACGCC CTCTCCCAGG AACCATTGCA CTCCCTCTGA 

2701 GAGGAGCCCA CCCACTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 

nil CAGACACGCC CAGCCATACT TTCTCTAGAG ATTTCAAGAA TCAAACAACT 

2101 CTCCTCCGCA CCACACATAA AAATCCCAAA GCAGGGCAAG TCTCCAGGCC 

2151 TCAAGCTACT CGATGACGCC AGGTCCGCCC CTATTATTCA TTCTCCTAAC 

2901 TCTTCATCCT GCTGCCCTTT CCACCCTTCT TTCCTCCTCA CCCACTGCCT 

2951 CCAATTCCTG CCCCCCCAGC CTGGAAAGGC TTCCATTTCT CTCTACCGGG 

3001 OGGCAGGCCG CTGACAATCG GTCTCTAArT TCTCTAAGAT CAATAAAGGG 

1051 CCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAACG 



Ho BLAST leault 



Mo Hedlin* t 



Pepcid* information foe (mm 2 



ORF from 47 bp Co 21(3 bpi peptide length: 93* 
Category; (Imilarity to unknown protein 
Claaairicatlon: unclaiiified 
Froaite motirj! ATP OTP A (121-832) 



1 KEESEDSOSD SOT PI SESQH SLKPNYLS0A KTOrSCQfOL LtDlOLKIAA 
SI KLLKSQIPPO VPPPLASCLV LKYPICLQCG RCSCWCHHK UJTTMPm, 
101 IYP01XLVAT PEGHCEVRIH LGrftLAIGKR SQISHTRERD RPVIRRSPI3 
1SI PSOMAKirT OASKSPTSTI DWSGPSOSP APV(JVYI*RG OMPPDLVEK 

201 txt ha pen rc rrovKHLPts dsestcsic** akvrtkktsd sktpmkpitk 
251 PXMHWtrTT HSRTTIESPS PELAAHLPPK ric*tctsta slkropkkps 
301 gpuFMOLLro suuiAFOTAn rviasvgrkp vdctppdnlh askhyypkon 
351 ardyclfssi kkokrsadki. tpagstikqe oilkcctvqc rsaqqprray 
401 sroPRPLiup kptosqscia fotasvgopl rtvqkdsssr MwirrMET 

4S1 SSOESKMLST PGTRVQARGR ILPGSPVKPT HKRHLKDXL.T HKEHNHPSPY 
501 RERTPRGPSE RTPJWPSMPJi HRSPSERSQft 33LEKRHH3P SQRiHCSPSK 

ssi kmhsspscrs «ft3Psoiwnc spperscksl serglkspsq nsHPCPsgpp 

601 KH3P3EK3NR SP3CR3HRSF SERRHRSPSQ RSHRGPSER3 HCSPSERRHH 

651 SPSQRSHRGP SERPXHKSPSK RSHRSPARKS MK3P3ERSKH SPSERSMMSP 

101 SERMM3PSE R3HC3PSER3 HCSPSERRHR SPSCftftMHSP SEKSHHSPSE 

751 JtSnHSPSERR RHSPLERSRH SLLERSHRSP SERRSHRSfE R3HRRISERS 

B01 H3PSEK5HL3 PLCKSRCSP3 ERRGHSSSGK TCHSPSERSH R3PSCKRQ5R 

til T3ER5HR3SC ERTPJISPSEH RPGRP5GRNH C3PSER5RRS FLKEGLKYSr 

901 PGERPSHSL3 ROTKtKJTTLL CTTHKHPKAC OVWRPEATR 

8 LAST P hit* 

MO BLASTP hie* available 

Altrt BLASTP hit* for OKFTphta >3_»gll, frame 2 

TP,£HBL;Ar0tlltS_l gene: "car90"; product: "cyst germination apecific 
acidic repeat protein precursor"; Fhytophthora inCescani cyat 
germination specific acidic repeat protein precursor (cat 901 gene, 
complete edi . , H - 1, Score » 457, p - 2.3a-39 

TP.eMBL:AC004S«l_3B gene: T16P2.41-; product: "putative prol lne-rlch 
protein"; Arebldopii* thai lan* chromosome II BAC F16H genomic 
aequence, complete sequence., N - 1, Score • 140, r • 4.2e-27 

TREKBL:Ar062655 1 product: "plenty-of-prol inaa-101"; Hul ouseulus 
plenty-of-prollnes-101 «RNA, complete edt.. ■ • 1, Score - 313, P - 
3.6e-24 

PIP:PMDO»» »on3 protein - human ((ragment), ■ - 1, Score - 292, P • 



>TRDOL:Anj*lltJ 1 gene: "car9(|-; product: "cyat gemination apecific a 
repeat protein pr*curaor"» Phytophthora InCcatana Cy»t germinatie 
apeclfic acidic repeat protein precursor <car*0> gene, complete cd 
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length • 1,489 



Qutry: 47 J SPVKRTWWLKDKLTHmNHFSrY-PXRTPACPSERTPJimwR«HMPSEASQRSSL 133 

*p * T ♦ + + + T+ *♦ t TP P + E T ♦ P+ + P+E + +5 

Sbjct: S84 APTEETNYAFIEET-TYAmETTYAFACETPYEPTEETTYAmmYAPTEETTYAST 642 

Qu.ry: 534 EWUUISPSQI«HCSMRK>«H3SFSEia««PSOWIHCSPPEMCHSLSEPGLHSPS{JR3H 193 

E ++P++ * + P+ ♦ F+E * **** +P E ♦ ♦♦ +E ♦ ♦»+♦ * 
Sbjct: 643 EETTY APTEETTY A F AEET PT EPTEETTY APTEETTY APTEETTY APTEETTY APTEETT 101 

Qu«ry: 194 RCPSOWUJHSPSERSMHSPSERSHRSPSERWIRSPSOHSKBGPSEMKCSPSERRMRSPS 613 

P+ + + P+E ♦ »F+E * + P+E *P * + CP*E » *P+t +P+ 
Sbjct: 703 YAPAEETPY EPTEETT Y AFT EETTY A PTEETMY A P I E ETT YC PTEETTY APTEATTY A PT 7*2 

Oufry: »4 QKSHKGPSEtUUUISPSKKSHKSPAIUlSHUPSEUHHSPSERSKHSPSEMHHSPSEKSH 71] 

. ♦ P+E ♦ P+ ♦ *P * + P+E * ++P+E ♦ ++P+E » P+E * 
Sbjct: 7 S3 ECT PY A PT EETT YEPTCETTT APTEETT Y APTEETT Y A PT EETTY APTEET PY EPTEETT l» 

Quary: 714 CSPSEJt3HCSPSEPJU<lt3PSElUUtHSPSEK5HHSP3ERSKKSPSERMMSPLEKSAH$l.I. 77] 

• P+E * P+E + P+E ++P+E+* ++P+E++ + + P+E ++P C++ 
Sbjct: 823 Y A PT EET P Y EPT EETT YT PT E CTTY A PTEETTY A FT E KTT Y A FT EETTY APTEET PY E FT III 

Query: 774 EASMItSPSEiU«HASlTlW-KRAISERSHSF$EKSHWPLiraRCSP5ERMHSSSGKTC 112 

E ♦ + P+ + + + E + ♦ E +++P+E++ + P E ♦ P+E »♦ + +T 
Sbjct; 883 EETT Y A PT K ETTY A PTEETTY A3TEETTY A PTEETT Y A P AEETPY E PTEETTY APT EETT 942 

Quary: 633 KSPSEItSHIUPSOmOGKTSEJtSHIlSSCEltTMISPSEKKPCRPSCRMHCSPSEUMSPI. IK 

+ *p+e + +p+ +e + + e t + ft p» +p+e * *t* 

sbjet: »4 3 yafteettyafteettyafteettyapaeetpyepteettyafteettyafteetwyafi tow 

Ovary: 193 KECL K YS rPGERPSHS LSRorKKQTT 911 

♦ E Y+ F E **♦ ♦ * * T 
Sbjct: 1003 EE- TT Y A- PTEETTY APAEETPYEPT 102* 



Quary: SO 2 ERTPMPSEIH'WIMPSHMHIUPSEItSOUSLEAftKltSPSOItSHCSPSPJnniSSPSESSM 161 

E TP P+E T ♦ P+ + P+E ♦ ♦ E + + P+ + + + P+ » P+E • 

Sbjct: 763 EET P Y A PT EETTY EPTGETTY A PTEETT Y A PT E ETT Y A PTEETT Y A PTEET PY E PT EETT III 

Quary: 162 UPSGittmCSPPEJWHSt^CKLHSPS0IUKKCPS0IUtHHSPSEA3HK5PSERSHKSPS 621 

+ F+ + f t » » >S ***** * P+++ ++P+E + + P+E ♦ P+ 

3b]Ct: 123 Y APTEET P YEPT EETT YT PTEETT Y APT E ETT Y A PTERTTY APTEETTY A PTEET P YE PT 1«2 

Quary: 622 tKRHHSPSQRSKAGPSEASMCS rSERRHRSPSQAJHRGPSERaHHSFSKAJHRSPAKASK 611 

( ♦ p, C + ♦ +E + P++ ♦ P+E ♦ P+ + ♦ +P * 

Sbjet: 113 EETTY A PT RETT Y APTEETT Y AST EETT Y A PT EETT Y A PAEET P YEPT EETTY A PT EETT 942 

Query: 682 MPSERSHHSPSERSHHSPSERRNHSPSEMHCSPSERSHCSPSEARHRSPSERSHHSPS 741 

♦P+E * ++P+E ♦ ++P+E + + P+E + P+E ♦ +P+E +P+E ++P 
Sbjct: 943 Y A PTEETT YAPTEETTY APTEETT YAFAECTPYE PTEETTY APTEETT Y A PT EETKY A P I 1002 

Quary: 742 mHHSPSEIUHHSPSEAIilUISPLCItSltHSUXRSHKSPSEIWSHASIXItS-HMISEAS 800 

.tP+E * ++P+E * P E » +♦ E ♦ +P+E + + a E + + E + 
Sbjct: 1003 EETTY A PTEETTY APAEETPYEPTEETT YAPTEETTY A PTEETTY AST EETTY APT EETT 1062 

Qu«ry: 101 HSPSEK3HLSPLERSM:SPSERM»SSSGKTCMSPSEA5HA5PSCNI(QGRTSERSHASSC 160 

♦♦P+E++ P E * *P+E ++ ♦ +T ++P+E + + P+ +E + 

Sbjct: 1063 Y AF AEETP YEPTEETT Y A PTEETTY A PTEETT Y A PT EETT Y A PT EETTY APAE ET P Y EFT 1122 

Outcy: 161 CKTPJtSPSEmPGHPSClUtHCSPSEURKSPUtC 194 

t T ++P+E P» +P E ♦ P+E 

Sbjct: 1123 EXTT Y A PT EETTY APTEETWTAP I EETT YGPTEE 1156 



Quary: 471 s PV K RTWH &HLKDKLTH ACHN H PS FY - RERTPRC PS E RT AHN PSWRNHRS P3ER3QRSSL 133 

♦P + T + +R T+ ** t TP P+E T + F+ »P+E + +S 

Sbjct: 141 A PT EETTY APT - EKTT Y A PTEETT YAPTEETPY EPTEETTYAPTK ETTY APTEETT YAST 906 

QU*cy: 134 EAIUtNSPSQIUKCSPSRWIKSSPSEIlSWRSPSORNHCSPPtKSCHSLSERCLHSPSQASH 193 

E ++P++ * +P+ * P+E • +P++ +P E + ++ +E ++P++ + 
Sbjct: 907 EETT Y A PTEETTY A P AECT P YEPTEETTY APTEETTY A PTE ETTY APTEETTY A PTEETT 9*6 
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Qu«rv: 194 RCFSQRRKKSPSEAStmSPSERSHRSPSERRHRSPSORSHRCPSERiHCSPSEAMRSPS 
p.. ♦ P+C ♦ *P*E . «F*E *P * * ft » *P*E F* 
SDjct: 9«7 YAP A EET PYE PTEETTYAFT EETTY A FT EETMY A F I EETTY A PTEETTY A FAEET PY EFT 

Ou«ry: (34 0UHRGPSCRRHIISPSKR5HRJPARRSKMPSEMHHSPSER3HHSP3CRRHHSPSERSH 

♦ ♦ p*e **p»* + ♦ * *r*t * **r*t ' * p*e **p*e ♦ 

Sbjct: 1011 EETT YAPT EETTY A PTEETT Y ASTEETTY APTEETTY AP AEET PYE PTEETTY AFT E ETT 

Outiy: 114 CSPSEMKCSPSEWWR3PSEARKHSPSEK3MHSFSER3HHSF3ERPRH3FLEHRKSU. 

*P*E * *F+E *P*E *+P*E»+ ♦ P*E + **P*E *»P E * »* * 
Sbjct: 1017 Y APT EETTY A PTEETTY A PTEETTY A P A EETPY EPTtETTY APTEETTY A PT EETIf Y AP I 

Qu«iy: "« tRSKR3PSEM3HR3r£J«-*(RRISER3HSPSEK»HLlPLER3RCSPSCRRGHS3SCKTC 

E * F*E ♦ * £ * * E ♦♦P»E»* P ♦ *P*E ♦* * *T 
S&Jet: 1147 EETTYGPT E ETTY A PTEATTY APT EET PY A PT EETT Y E PTCETTYA PTEETTY APTE ETT 

Qu.ry: 833 KSP3ERSHMPSCHR«RTSERSHR3SCERTRHSPSEMRPCRP3CRKHCiPSER3R*SPL 

.♦ P , c ♦ *P* »E * * E T * HE P» *F*E • *P 

Sbjet: 120? Y APTE ETT Y A PTEET PY E PT EETT Y A PTEETT Y C PTEETTY A PTEETTY A PT EETT Y APT 

Outry: 193 KE 894 

♦ C 

Sbjct: 12(7 E£ 1268 
Scot* - 439 <M.» biti). Expect - I.0«-37, P - 2.0»-J7 



Qu«ry: 
Sbjct: 

Qu.ry: 911 ERRHR3P30RSHC3PCR»«MSSP3ER3WRSP30RJIKC3rPER3CM3L3ERCLH3P30R5H 393 

E t *p* * P*E + *w** *r t * ** *e **p»* + 

Sbjct: 499 EETT Y APTEETTY A FACET PYE PTEETT Y A PTEETT Y A PT EETT Y A PT EETTY A PTEETT 338 

Ou«ry: 394 RCPfORRKMPftll3H«PSER3MR3«EMUiRSPSOMKRCPIERSKC3PSERAHRSP3 «S3 

P»* * P»E * »P*E * *P*E *P * * F*E ♦ *P+E P* 
Sbjet: 339 Y A FAEET FY E PTEETTY A PTEETT Y APT E ETWY A P I EETT Y A PT EETT YAP AEET PY C PT *1« 

Quary: *34 QMN1u;P3ER1UOISPSKR3H«3PARRSI1R3PSER2MH3PSER3KH3P$ERRlUtSPSER3H 1 

Sbjct: 



Quaiy: 714 C3P3ERSHCSPS EJUtHRS PS EARKHS PS EK3KH3 PSER3HH3 PSERRRHSP LER3 RM 3LL 773 

*P*E ♦ *PtE *P*E **P*E»* + P*E * «P*E +*P E * ** ♦ 
SbJCt: €79 Y A PT EETTY A PTEETTY A PTEETT YAP AEET PYE PT EETT Y APTEETTY APTEETMY A P 1 738 

Sbjct: 

Qu«ty: 833 HSPICRSHRSP3C((X0SRTSERSHRSSCERTlUtSPIE»tRPGRP3CMKCSPSER3RR3PL 892 

+ *P*E + *P T E ♦ + B T * + P»E P P* +P*E * »P 

SbJCt: 799 YAPT C ETT YAP- TEETFYEFT-E 

Ouary: 893 KEGLKYSITGERPSHS 108 

♦ E Y+ P Et +♦* 
Sbjct: 831 EE- TT Y A - PT EKTTY A t«4 

Ou*ty: 302 £RTPRCPSERTRHNP3WUIHRSPSER30RS5LERJUtH3PSOR5HCSP3R]aiHS$P$ER3M 111 

C TP P*6 T + P+ +P*E ♦ ♦ E+ *+P+* * +P+ ♦ F*E * 

Sbjet: 419 EET P Y E PTEETT YT PTEETTY A PTEETT Y APTEKTT Y A PTEXTT Y A PTEET PYEFTEETT 478 

Query: SM R3 PSORVHCS P PER3CH3 LSERCLH S PSQR3 HRC P30RRH KSPSERSMRSPS ER3HR3 PS 621 

+P.+ ♦![».. *Z *»P+* ♦ P** * F+E ♦ *P*E . *t* 
Sbjct: 479 Y APTKETT Y A PTEETTY ASTEETTY A PTE ETT YAP ACET PYE PTEETT Y APTEETTY APT 538 

Outiy: t2I ERRMR3PSQRSHRCP3ERSKCSPSERRKR3P3{JRSMRCPSERRHMSPSKRSMR3FARRSH <81 

e *r** * p*e » «p*e p*+ ♦ p+e * *r ♦ 

SbjCt: 339 EETT Y A PTEETT YAPTEETTY A PAtETPY E PTEETTY A PTE CTTY A PT EETHY AI 1 EfTT 398 

Ootry: «82 RSFSERSHKSPSERSHKSFSERRHHSPSER3HCSPSERSHCSFSERR«R3P3ERRMKSPS 741 

+P*E * "FtE + » P*E »*P*E • +P»E + * tE *P*E »*P* 
Sbjct: S99 Y A PTEETTY A P AEET PY EPTEETTY APTE ETT YAPT EETTY ASTEETTY APTEETTY APA til 

Ou*iy: 742 EXSHHSPSEHSKHSF3ERRRMSPt.ER3R>ISLLERSNRSPSERR3MRSrER3-KRRISER3 «00 

E*» * PtE ♦ **P*E **P E + *. E * *P*E E ♦ ♦ E • 

SbjCt: «39 EET FT EPT EETTY APTEETTY APTtrTTY APTEETTY A PTEETTY A P AEET PYE PTEETT 718 
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Query: 101 H3»tK£HLSru:KSKCSPSCRIiCH3SSCKTCHSrSEItSHlt5MO«R(X>nT3C»HRSIC 360 

++P+E++ +p e + +p e * * *t ++p»e » +p+ +c * 
tb}ct: ii» T»fTCcnY»rrtE™»A»ifXTT¥cmETTtAPTEATTY*FTenp«iTCtTTYcrr 771 

Query: til EKTMI3P3WKItP3GNtHC$r$EK3RRS(>LKCGUYSrPGCRrSN3L3KDrKM<R-r lit 

T ++P+C P* + P+E + +P +C Y PC + + + + + + T 

Sbjct: 779 GETTY A PTEETTY APTEETTY A PTEETTY APTEE -T PY E - PTEETTY APTCET PY CPT «« 

Score - 43» tt*.2 blM). Expect - J.1.-36. P - 3.1e-3* 
Identtele* - It/HO I20M, Po»itivei - 22t/*«0 (Sill 

Query: «7J PC3PV((im*MWlLltOKLTHl<E«l<HPSm-ErrPMPSERTAK»(PSWWIHRSPSCRS0»» 531 

p P ♦ T * K* T» +♦ E T P+E T + P* P+E + ♦ 

Sbjet: <10 PYtPTtETTYAPTKET-TYAPTIETTYASTEETTYAPTEETTtAPAEETPrEPTEETTYA 321 

Query: SJ2 SLCWWHSPSOASHCSPSRKKHSSPSERSWRSfWRUHCSPPEflSCHJLSEBGLHSPSO* 59 1 

c „•++ . +.+ * , P+ r . + p ++ p t * ♦* +e ++P++ 

»jCt: S29 PT C CTTYAPTEETTY APTEETTY AFTEETT YAP AECT PY E PTEETTY A PTEETTY A PTEE 311 

Ouery: 392 SMUOPSOMKHSPSEASHMPSEMHMPSEWWRSPSORSKBCPSEASHCSPSEMHM 651 

♦ P ♦ + +P+C » + P+I • P+E +P+* * P*E • • +E * 
Sbjct: SB9 TMY A P I EETT Y APT EETTY A P AEET PY EPTEETTY APTE ETT Y APTE ETT Y A3 TEETTY A 641 

Ouery; (32 psgitSHItGPSElUUIKSPSKItSHPSPAMSHMPSCJUHNSPlEKSKKSPXCMHNSPSEK 711 

p»* * P+E * P+ + ♦ «P + + P+E + *+P+E * + + P+E ++F+E 
Sbjet: t«9 PTEETTY A F AEETPYE PTEETT Y APTEETTY APTEETTY A PTEETTY A PTEETTY A PACE 701 

Qu.ry: 712 SHCSP5CR3HCSPSCM»KSP3EIUU{H3PSCKSHK5PSEA3HH3rSEMlUHSPLCnSPJIt 771 

* p.E * + P+E +P+E ++P C++ » p+e + ++P+E ++F c + + • 
Sbjct: 709 TPYEPTCXTTY A PTEETTY AFTEETHYAP I EETTYCFTEETTY APTEATTTAFT EET PY A 76S 

Query: 7T2 LLtJUHM PStAAJHRirCRS-HIOll $IR3HSPSCXSHLSPL£«RCSrsrRAOHSSSCX 130 

E + P+ + + E * * t +++P+E++ »P E + P+E ♦ ♦ ♦ * 
8b Jet: 769 FT EETT YEPTGCTT YA PT EETT Y A PT EETTY APT EETTY A PTEET PYE PT EETT YAPT E E 326 

Query: 131 TCHSPSEMHMP»C»0WATSEI«HPi3CEllTI<KSfseMRPCRP3CWit»C3PSEP.SRPS 390 

T + P+E ♦ +P+ +t + » E+T ++P+E P+ P+E » + 

IbjCt: 12 » T PY EPTEETTYT PTEETT Y APTECTTYAPTEXTTY APTEETTY A PTEET PYEPT EETT Y A ttt 

Qu.ry; HI PLKCGUtYSrPCEAPSHSWRD 912 

P KC Y+ P C +♦♦ » + 
3b jet: M9 PT«-TTYA-FTCrTTYASTEE »0I 



Query: S02 ERTPIKPWTIUOIPSirWiHMPSERSWSLEAIWHSPlQRSHCSPSAKKHSSPSEK" 361 

E T GP+E T + P+ +P+E + ♦ E + P+ + +P+ * + P+C + 

Sbjct: 739 CETTYGPTEETT Y * PTEATTY AFTEET PY APTEETTY EPTGETTY APTEETTY AFTEETT 791 

Queryt 3C2 R»PSQIWHCSPPEItSCHSLSEl»GLH3PIQASHW;PSORRKHSPSEIlSH«PSERSHPJPS 621 

+ P+ + +P t + + +E + *F + » * P+» + 'P*E * *P*E + *f 
Sbjct: 799 Y A FT EETTY A PTCETPYCPTEETT t A PTEET p YEFTEETT YT PT CETTY APTCETT Y A PT (it 

Query; 623 EKIWIUPS0IISHMPSEIUHC5P3EIWMPMMHP£P3E1UU«HSPSKRSHRSPAKK3H SSI 

E+ + F+ + « P+E + P+E +P++ + P+C •+ ♦+ + +P + 
SbJCt: »S9 EKTT Y A FT CETTY A PTEET PY E PT EETTY APTKETTY A PTEETT Y AST C ETTY A PT EETT 910 

Query: 612 «PSER3MKSPSEHSKMSPSEAW<HSPSEAStlCSPSE«HCSP*EAW<«P3EA*KKSPl 7*1 

+P+E + ♦ P+I ♦ + + P+E ++P+E + *P+E + +P*E +P+E + P+ 
SbjCt: 919 YAP A EETP Y EPTEETTY A PTE ETTY A PTCETT YAPTEETTY A PTEETT Y AP AEETP Y CPT 971 

Qu.ry: 742 CMKHSPSEASMMJPSCHARHSPLERSHHSLLEMHMPSERMHRSrEJlS-HMlSCM BOO 

E+* ++P+E + ++P+E ++P+E ♦ «♦ t » +P+C ♦ E * + C » 
Sbjct: 979 EETT YAPTEETT Y APTEETKYAP I EETTY APTECTTYAPACCT PY CPTEETTY A PTEETT 1031 

Ouery: SOI HSPSEXSHUPLEK3IUSPSE)UUM3SSCKTCM3P3EA3HIUP»CKK0CKTSEMHItS3C IU) 

++>+£++ + E * +P+E ♦+ » +T ♦ P*E ♦ +P+ +E • + 

Sbjct: 1039 Y APT EETT Y AST EETTY APT EETTY A PAEETPY CPT EETTY A PTEETT YAPTEETT YAPT 109t 

Ouery: ttl ERTPJCS P3EM&PG APSCPJIHCS P3EKS AA3 PLKE t» 

E T +*P+t P» P+E ♦ «P +E 

Sbjct: 1099 ECTT YAPTEETT Y AP AEET P YEPTEETTYA PTEC 1132 



Query: i02 ERTPRGPSCrTRHIIPSHMHRSPSEMQMSUPJUtHSPSQRSHCSPSItKNHSSPSeHSII 5S1 

E T P+C T » P+ +P+E * + E + P++ * +P+ * *P*C ♦ 

Sbjct: 939 EETTY A PTEETT Y A PT EETTY APTEETTYA PAEET P YE PT EETTY APTEETT Y A PTEETH 991 
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Query: 5*2 HSPSQRKHCSPPEHSCHSLSERGLMSPSIJRSKRCPSOWIHHSPSERSMRSPSERSHIISPS £21 

+p ♦ .p e . « *i * p*+ * p++ ++P+E * + +£ + +P+ 

SbjCt: 99* YAP I EETTY A PTEETTY A FACET PYEPT EETT Y A PTE ETTY APTEETT Y A3TE ETT Y A PT 1059 

Query: 622 iiwhiiSPSOIWHIWSPSERSHCSPSERIWIBSPSOFSMRCPSEIWHHSPSKKHRSPAIWSH «B1 

E +P++ * P+E + +P+E + P++ * P+E + + F+ + + + PA + 
Sbjct: 1059 EETTY A PA CCT PY E PT EETTY A PT EETTY APTE ETT YA PT E ETTY APTECTT Y A FACET P 11 IS 

Query: (82 RSPSERSHHSPSEPJMKSPSERPJfKSPSERSHCSPSEWHCSPSERRKRSPSERRJHHSPS 741 

p*E + ++P+E * ++P+E ++P E + P+E + +P+E + P+E *»P* 
Sbjct: 1119 Y EPT EETTY APTECTT YAPTEETH Y A P I E ETTYGPTEETT Y APTEATTY A PTEET PY APT 117B 

Query: 742 EKSHHSPSEPJHH5PSERRPJISPLERSlWSl.LERSNRSPSERWKMri:RS-HRRISER3 BOO 

. p* + **p+E *+P E + + + E + + P+E + E + + E * 
Sbjct: 117» EETT YCPTG ETTY A PTEETTY A PTEETTY AFT EETTY A PTEET PY EPT EETTY APT E ETT 1231 

Query: 101 KSPSEKSHLSPLEHSRCSPSEPUCMSSSGKTCHSPSEAiHRSPSGMPOGPTSERSHRSSC ««0 

♦ P+E++ + P C + +P+E ♦ • + +T ++P + + P+ +E + * 
SbjCt: 1239 YE PTEETTY APTECTT YA PT EETTY APTEETM YA P I DETT YG PT EETT Y A PTEATT Y APT 1299 

Query: til ERTPJISPSEHRPCWSCRWICSPSEASRRSPLW: 194 

E T ++P+E P+C +P+E + + P +E 

SbJCt: 1299 EET PY A PT EETTY CPTGCTT YA PT EETTY APTEE 1332 

Score - 422 {63.3 bits). Eipect - 1.4e-35, P - 1.4a-3S 
Identitiei • 14/407 (20%). Poiltlv«i • 216/407 IS3») 

Ouery: 502 EPTPFCPSERTRHNPSWRMHlWPSERSORSSLERWIHSPSaPSMCSPSRWtKSSPSERSH 561 

E T P+E T + P+ P+E + + E + P+ + * + P+ * + P+E + 

SbjCt: 795 EETT YA PTE ETT Y A PT EET PY E PTEETTY APT EET P YE PT EETT YTPTEETTY AFT EETT 154 

Query: 562 ASPSQIUIHCSPPEHSCHSLSERGLHSPSORSHRGPSORRHHSPSERSHRSPSERSHRSPJ 621 

+F+** tf C ♦ ** « + P++ + P+* ++P+E * + +E + + P+ 
Sbjct: 855 YA PTEKTTY APTECTT YAPT EETPY C PT EETT Y AFTK ETTY A PTEETTY ASTCETTY A PT 914 

Query: 622 CPJUHRSPSQRSHRGPSERSKCSPSEfJUIRSPSQNSKRGPSERPJIHSFSKASHUPAARSH 681 

E + P++ ♦ P+E + + P+E +P+ + * P+E ++P++ + +PA + 
Sbjct: 915 EETTYAPACETPY EPT EETT YAPT EETTY APT EETTY APTEETTY APTECTT YAPAEETP 974 

Query: 6(2 RSPSEMKHSPSERSIIHSPSERRHKSPSERSHCSPSERSHCSPSCRRKRSPSERRKHSPS 741 

r,K * *+P«E + ++P+E ++P E • +P+E + +P+E P+E ++P+ 

Sbjct: 971 YEPTEETTT A PT EETT YAPTE ETMY API EETT Y APTECTT Y A F AC ET F YEPTECTTY APT 1034 

Query: 742 EK3HH S PS ER5 KH S PSERARKS PLERS KK51iLER5 HR3 PSERAS HRS rERS - H RR I S ER3 100 

€+• ..P+E + ++ +E »+P C++* E ♦ P.E .♦ E ♦ + E + 
SbjCt: 103 J EETTY APTEETTY ASTEETTY APT EETTYA P AEETP Y C PTEETT Y APTEETTY APTECTT 1094 

Query: 801 HSPSEKSHI^PLEP^RCSPSCRRGHJSSCKTCHSPSERSHRSPSCKRQGRTSEFtSHRSSC 860 

++P+E++ +P E + + P+E * + *T ++P+E » +P+ E ♦ 

Sbjct: 10SS Y APTECTT Y AFT" ETT Y A P AEET P Y EPT EETTY APTEETTY APTEETM YAP I EETT YGPT 1154 

Query: 961 CRTRKSPSCIIRPGPPSGPJIHC3PSCKSRRSPLKCCl.KY3rp<;ERPSHS 909 

C T ++P+E P+ + P+E » PC Y+ P E +++ 

SbjCt: IliS EETT YAPT EATTY A PTEET P YAPT EETTY EPTGE-TTYA-PTE ETTY A 1200 

Scor. - 421 (63.2 bit* I, Expact - 1.9.-35, P - 1.8a-3S 
Id.ntitl.i - 86/418 1201), Poiicivai - 219/419 (52%) 

Ouary: 491 MltMNMPSrYRI^TPRCPSERTRHHPSWWIHRSPSERSORSSLCRRMMSPWRSHCSPSR 550 

H K E T P+E T + P+ +P+E + + E + P+t ♦ +f+ 

Sbjct: 376 H YAH I CK PC DTEVTKY APTEETTY APTEETTY A PTC ETTY A PTEETPYEPTE ETTYT PTE 435 

Ouary: 5SI KMHSSPSERSWRSPSOPJrHCSPPCRSCHSLSCRGLHSPSORSKRGPSQRRHIISPSERSHR 610 

♦ +P+E + +P+++ +P £ + ++ +C + P++ + P++ ++P+E ♦ 
Sbjct : 4 36 ETTYAPTEETTY A PTEKTTY APT EETT Y A PTEET PT EPTEETTY A PTKCTTY APTEETTY 495 

Quary: til SPSER5HRSPS ERRH RSPSORS HRCPSER5HC3PS ERRH R3PS0R5KRCPS ERRHH S PS K 670 

+ +E + +P+E +P++ + P+E + +P+E +P++ + P+E ++P++ 
SbjCt: 496 ASTEETTY A PT EETT YAPAEETPYEPTEETTYAPT EETT Y A PTEETT YAPTECTTYAPTE 5J5 

Query: 671 RSHR3PARPSHRSPSERSKMSPSER3HHSPS ERRHH5 PSERShCSPSERSHCSPS ERRH P 730 

t +PA + P+E » ++P+E + ++P+E ++P E ♦ +P+E + +P+E 
SbjCt: 556 ETTY APAECTPYEPT EETTY A PTE ETTY APT EETHY API EETTY A PTEETTYAP AEET PY 615 

Query: 731 3P3E{UU<HSPSCKSKHSP3ERSHHSP3ERRRHSFLERSRHSI.I.ERSHRSPSERRSHRSFE 790 

P+E *tf*t** ++P+E + ++ +E ++P E + ++ E + P+E ++ E 
SbjCt: 616 CPTE ETTY APT EETT YAPT E ETT Y A S T EETT YAPTCETT YAP AEET PTE PTEETTY A PT E 675 

Query: 791 RS-HRRISER3HSPSEKSHLSPLER3RCSPSERRGHSSSGKTCHSPSERSHRSPSSHRQG 649 
+ + E +++P+C++ +P E + +P+E + + +T ++P+E ♦ +P+ 
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«7* OTT APTEETTY AFTEETTYAPTEETT YAP AEETPYEPTEETTY A PTEETT YAPTEETM* 73} 

tlO KT9ER$KMSCERTRH$PSEH)tKK»SGIUtHC$r3EllSltltSrLKECLKYSri>CEIIKHS 901 

E » E T **P+E P« *P*E ♦ P E Y+ P E **♦ 

734 API EETTYG PT EETTY A PTEATTY APT EET PY A PT EETTY EPTGE - TT Y A- PT EETT Y A 792 



Sbjct: 

Sbjct: 

Sbjct : 

Sbjct i 

Sbjct: 
Qutry: 
Sbjct: 
Qu*ry: 

Sbjct: 

Owry: 
Sbjct: 
Qu«ry: 
Sbjct: 
Quacy: 
Sbjct: 

Sbjct: 
Query: 

Sbjct 
Ouoty: •«! 
Sbjct; 1J07 
Scot* - m 



SC UPSQIUniCSrPEKSCttSLSEftGUHSPSOKSHACPSORKKHSPSCMHKSPSEftSHKSPS (21 
+P+* *F E ♦ «♦ *t **P»t + t** * P*E ♦ *P*E * + F* 
1031 YAPTEETT Y APT EETT Y ASTEETT Y A PTEETT YAP AEET P Y E PTEETT Y APTEETTY A PT 1090 

C22 EIUUlMPSCKSKItCPSEKSKCSPSEIUUIIt3PSa«SH)tCPStKKKHSPSKK3HK3?ARltSH it I 

t *r** ♦ p»e . +p«e p« . p*e * *r * 

1091 EETTY APTEETTY APT EETT YAPAEET PY EPTEETTY APTI ETTY APTEETMY AP I E ETT 1110 

«I2 MFSERSHHSPSEMHKSPSEAltMHSPSERlHCSPSEUKCSPSEARHUPSEMHKSPS 741 
P*E * *+P+E + **T*t **P»E * P* ♦ »P*E *P+E **P* 
YCPTEETTY APT EATTYAPTEETPY APT EETT YEPTGETTY A PTEETT If APTEETTY APT 1210 

712 CMHHSMEUKHSPIEMUUtSPLEKSAKSLLCRSKMPSCMSHRSrEU-HMISEIU BOO 
I** **P*E + • P»l »H • < E * +F+E »♦ Z * * E 
1211 EETTY APTEETF Y EPT EETTY A PTEETT YEFTEETTY APTEETTY A PT EETTY APT EETM 1270 

• 01 ttSPSEKSKLSPLEMUCSPSEWWMSSSCKTCHSPSIMHMPSCMRQCATSEASMMSC StO 



L (10*1. Poiltii 



S«2 KH-SOMHCSPPCRSCHSLSEHGLHSPSCmSMRCPSOIUUHHJPIERSHBSPSEHSHMPS 621 
+P** +P E ♦ + ♦ £ ♦ * P+* ♦ ♦ *E + *P*E • *t* 

1007 Y APT EETTY A P AEET PTE PTEETTY A PT EETTY APTEETTY ASTEETT Y APT EETTY A PA 1 

622 EMWUPSQtUHIUPSCKSHUPSEJIMUrsOMKF^PSEMHKSPSKMHMPAKKSK «*1 
E P*+ * P*E + »P*E *t** * P*E **P*» ♦ r * 

10S7 EET P YEPT EETT Y APTEETTY AFTEETTYAPTEETTY APT EETTY A P ACCT PYEPT EETT I 

U2 RSFSEASMHSPSEKSHKSPSERiUIKSPSEIUHCSPSEMMCSFSCIUUIRSFSEP.RMHSPS 7 
,p.I ♦ **f*t * *+P E « P*E * +P+E * *T*K *P*E + P+ 
1127 Y APT EETT YAPTCETMY AP 1 EETT YCPTEETTY APTEATT Y A PT EET PY APT EETT Y E PT 11(1 

CKSHHSPSE)t3HH3PSElUUUISPLEft3IU4SLLE»H»P3EM3HUFEXS'HRRISt«S 100 

»• + tF*I * •*P»E **' E * *• E • P*E ** E + ♦ E * 
CETT T APT EETTY APTEETTYA PTEETTY APT EET PY E PT EETTY APTEETTY EPTEETT 121 

M SF SEKSHU PLERSftCS PS Eft AGH 3 33GXTC KS PS ER3 K R3 P3CMRQGRT SERSHA3SC 1 60 
**F*C** *P E ♦ *P+E ♦* *T ♦ P*E • *P* + E + ♦ 

Y APTEETTY A PT EETTY APT EETMY A P I OETT YGPT EETT YAPT EATTY APTEET P Y A PT 130 



1H7 



SLfTSn>CC*PSHSL5RD >12 



EETTY EPTCETT YAPT EETTY A PTEETTY A PKEE - T PYE - P AEESTST V ST E 133* 



Sbjct: 



• 7} K3PVIUtTWH)Ult.K0KLTHKEHH!IP3rYH-EKTPtlCPSCNTKintPSinUIKIt3P$Elt30lt3 531 

P P * T * T* E T P+E T + P+ P*E * + 

171 PYEPTEtmf APT AET-TYA PTEETTY ASTEETTY A PTEETTY A P AEET P YEPTEETTYA 93* 

132 3LEMUIK3PS0K3KC3PSIUtMKSSP3E)lSHIUPI0RMHCSPPEK3CHSLSEKGLH3PX}n 591 

C *tp++ . tp* + .p*i t »p+* p E + ♦* +E ***** 

937 PTEETTY A PTEETT Y A PT EETTY APTETTTY A PAEET PY E PTEETT YA PTE ETT YA PTEE 99t 
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Sbjct: 997 THYAPI EETT YAP7EETTY APAEETPYEPTLETTY APTEETTYAPTEETTYASTEETTYA 101 
Ouary: S52 PSOMKPtGPStlUtHMSPSKMWSSPAimiMMfSEMHHSPSEItSHKSI'SCMJWiPSC* 711 

P.. . he • r»» • *p ♦ *p+e ♦ **p*e ♦ **P+t **p*c 
sbjct: ios7 rTErnYAPAirrPYCPTCtrrYAPTtrrrYAM-crrTYApTcrTTYAPTCiTTYApAxt in 

Quary: 712 SHCSP»Cl««CaPSenlWHSPSERW«l»P*CKiHHSMCMIIHSPSEW«HSPL«SMS 171 

+ p*e + *p«c *p*e **p e** ♦ P*e ♦ ♦♦P»E *»p c ♦ ♦+ 

Sbjct; 1117 TPYErTCETTYAPT EETT Y APT EETNYAPL EETTYGPTCETT Y AP7EATT YAPTEET PYA 117. 
Quary: 772 LLEKSHUMEMtSKUrEU-HMlSEllSHSPSCKSHLSPLCKSKCSmMCHSSSCK 130 

t . r . ** £ * • E ♦♦*p*e+* *r t * p*e ♦ * 

Sbjct: 1177 PTtETTYEPTGETTY APTEETTYAPTEETTYAPTEETTYAPTtrTPYCPTEETTYAPTEE 12J. 

Ouary; t3l TCKSPSERSHBSPSGIIROCP-TSERSHMSCERTRKSPSCHRPCRPSSftKHCSPSERSRRS ISO 

T * (*t * *P+ *C • * E T **P + P* *P+E ♦ * 

SbjCt: 12J7 rTYEPTEETTYAPTEETTYAPTEETTYAPTECTHYAPI DETTYGPTCETTY APTCATTYA U9 

Ouary: 191 PLM (94 

p *E 

Sbjct: 1297 PTEE 1100 



Quary: iOl RE*TPRCPSERTR«>IPSW(Uir(HSPSCMOMSUERXHHSP*0RSHWPSP.Ii-MK$SP5ERS S60 

RE T PSE T * P *P+E* *E + + *+ *t** t+P+EP, 

St>j«: 319 REETT A A PS EOTT YAP REVT P Y APT E It I Y - - DV EETTY VT EESTY - A PT KS ETH APTERH 371 

Ouary: S« WRSPSQRKMCXPPIPJCKJLSERGLHS PSORSHP-GPSOPJUtXSPStASMMPSEPJHRSP C20 

. *. C E ♦ ** *E ♦ *?♦♦ * P** **P*E * P*E » *P 
SbJCt: 37< H YAH! CK P-CQT- EVTMYAPT EETTY APT EETTYAPTEETTYAPTEiTTPYCPTEETTYTP 433 

Ouary: (21 SEKIUIllSPS^H*£PSEU)lCSPSEKIWKSPSgRSHP£PSElUU(HSPSKlUHn3rAAXS CIO 

♦ E .P*t + P*E»* *P+E +P+* * P*E **P*K + tp * 
SbjCt: 434 TEETTY A PTEET7 YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEET 491 

Ouary: Ml HJtSPSEKSKMSPSEUKHSPSEARKMSPSEKSHCSPSEJUKCSPSEMUtftSPSEItMIHSP 710 

♦ *E * * **P*E * P*E * tPtE * »P*I +P»E **P 

Sbjct: 494 TY ASTEETTY APT EETTY A PACETPYEPTEETT YAPTEETTY APT EETTYAPT EETT YAP 11 J 

Ouary: 741 SFJt3HMSPSCASKHSPSEIUUUtSPLENSAHSl.LERSHMPStKp;S)HR5FEU-KRXI3EP. 79» 

♦E*» *»P*E * ♦ P*t **P E * E + *P E ** E * » E 
SbjCt! 114 TECTTY A P AE ET PYEPTEETTY APTEETTYA PT EETMY AP I EETT YAPTCETTY AP AEET 61 J 

Ouary: tOO SHSPSEKSHLSPUlPJRCSPSEKACiKJSSCHTCHS P5CR3MPJP5CJW0GPTSEPJHPSS »SJ 

♦ P*E»* *p E * *P+£ **t* *T **P+E * +P* •£ * + 

Sbjct: 611 PY EPTEETT Y APTEETTYA PT EETT TASTE ETT Y A PTEETTY A P AECT P YE PTrETTY AP (73 

Ouary: 1(0 CERTKHSPSEMtPCRPSCPJIKCSPSEASAKSPLKE (94 

E T "P+E P* »P*E ♦ *P *E 

Sttjet: »74 TEETTY APT EETTY A PTE ETT Y APT EETTY A P A££ 70t 

Scora • 
Idantll 

Quary: 475 SPVKPTWMIU(IJlDKLTMKEHKHPSFY-lu;F.TPRCPSEPTPJ<NPSt(R«HR3PJEM0R3SL 133 

tp ♦ T * *+♦ T* ♦ ♦ E TP P*E T + P* *P*E ♦ +S 

SbjCt: 992 APTEETJIY A P I EET-TY APT EETTY A PAEET PTE PTEETT Y APTEETTY APTEETT YAJT 1010 

Ouary: erpjhhspsoaskcspshkkhsspseiisviispsoeoihcsppepschslserglhspsop-Sh i*3 

E **P*» « *Pt * P*E ♦ »P*+ * P E + ++ +E *+P** * 
SbjCt! 1011 t ETTYA PT EETTY APAEET P YEPTEETT Y A PT EETT YAPTEETTYAPTEETTYAPTEETT 1110 

Ou*iy: 194 IWPSQPJUIKSPSEItSHUPSEMHUPSERIUIRSPSOASKRCPSERSHCSPSEAPJinSPS 653 

p»* * P+E * *P*E • »P*E *P + ♦ CP*E * *P*E +P+ 
Sbjct: 1111 Y AP AEXTPYEPTEETTY A PT EETTYAPTEETPtY A p I EETT YC PT EETTY APTEATTY APT 1170 

Quaiy: «14 OUHIUPSElUWHSPSKIUKKSPAP.ItSHP.SP3CIISHHSPSERSHHSPSEKP.HH3PSER3H 713 

♦ ♦ P«E ♦ P» * *P * *P+E ♦ "P+E ♦ +»P+E » PtE c ^ T 11J0 

Ouary: 714 C3PSCMKC3PSEPRKR1PJE1WIH5PJEICSHKSPSE«HHJPSEPJULHS»LEKSW<SLL 773 

.p.E . P+E tP*E *»P*E»* .*P*E * +'P * * P E * ♦* 
Sbjct: 1231 Y APT EETT YEPTEETTY A PT EETT YA PTEETT Y A PTEETH TAP! DETTYG PTEETTY APT 1190 

Ouary: 774 EUHR3PSEIuUHK3rEIUHRPISEIUHSPSEUHUPLEMnCSPSEP.KCH33SCKTCH 133 

E ♦ +P*E + E E *♦ P+ **■ +P E + *P*E ♦» *T * 

SbJCti 1291 EATTYAPTMTPYAPTE ETTYEPTCETT YAPTtETTY APTEETTYA PHEET t Y 1343 
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WO01/IMJ9 



PCT/1BA0/M496 



Id«nti 

' «>]et! 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct i 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



324 PSEMQRSSICIUWHSPSOWMCSPSPKBHSSPSERSWMPJOMHCSPPEMCHSLSEJI 

n* * c •« *p e ♦ •*•»• • e + * **e 

303 PSDETCAPT -EGTTYVPItLrTTAAPSEDTTYAPFEVTPY APTEFPY- -DVEETTY- VTEL 3JI 



3*4 GLHSPSQHSHRCPSOJWIHSPSE*" 

HP»* P««P, H*+ E 

359 STYArTKSETNAPTEJUfHYAKIE 



"IHItsrSEKSKMFSEttlutUPtOMHItCPS 637 
* *t*t * +P*E *P** * P* 
PCDTCVTMYAPTEETTYAPTEETTYAPTEETTYAPT 4 It 



£39 ElUHCSPSEIUtHIISKOMHRCrsCIUUIHSrSKKSHRSrAMjHMrSUUHKSPSEKSH 497 

e * r*E *r*« • P*£ +*p**+* *P * *t*t * * t*t * 

419 ECTPYEPTEETTTTPTEX7TT APT IXTTYAPTEKTTYAPT EETT YAPTEE7PYEPTEETT 411 

691 HSP*EI««KSPSEJ«HClPiE««CSPSEJ(KHltJPSEPRHKjmKSHKSMCPlKMSP3 751 

♦» r +, + ♦ *E ♦ *P*E +P*E + P+Et* *+f*E ♦ **F+ 

47* Y A PT KETT Y APT EETT Y AJTEETT Y A PT EETT YAPAEETPYEPT EETT YAPTEETT Y APT S3I 

751 EJUIMSPLE»IUt>UJJtSHRSrSEIUI3HIUrER5-KMIIIJtSKSPSEKSHUPLEIlSA Sl« 

E **P E ♦ E ♦ *P»E * E + * E *P*E ♦ 

539 EETTY APT EETTYA PTEETT Y A PAEETPY I PT EETTYA PTEETTY APT EETYfYAP I EETT 5?l 

• IT CSPSCRRCHSiSCKTCHSPSCASHMPSOIPOGPTSERJHRSSCERTPJHSrtEMPPCRPS Bit 

*F»E + ♦ ♦ *T • P*E ♦ *P* +E ♦ *S E T ++P*E p+ 

J99 Y APTEETTY A P AEETP YE PTEETT Y A PTE ETT ¥ A PT EETTY A3TEETTY A PTEETT YAP A £51 

ITT CMHHCSPSCUlUUPLKECLKYSrKEAPSNS »0t 

p+c ♦ *t »e y* r e • 

£59 CETPYE PTEETT YAPTEE-TTYA-PT EETT Y A HI 



Identltie* • 667J21 

°" ^ E T P+E T * Pt +P*E E **p«t • ♦ ♦ • P+E ♦ 

SbjCt: 1059 EETT YAP AXET PY E PTCCTTY A PTEFTTY A PTEETT YAPT EETTYA PTEETTY AP AECTP I1H 

Ou«ty: 562 ASPSOPJIHCSPPEASCHSUEBCLKiPSOPSHUCPSQWWMSPSEPSHMPSEASHMPS 621 

p»t *P E * ♦ + *E *»*P * ♦ CP+ + ♦ ♦P+E + +P+E « + P+ 
Sbjct: 1119 Y EPTEETTY APTE ETT Y A PTE ETWY A P I EETT YC PTEETT Y A PT EATT Y A PT EET PY A PT 1171 

Query: «2 CPPHRSPiORSHKCPSERSHCSPSCARHPJPJOASHRGPSERAXHSPSKRJH WPARAiH £11 



Sbjct: 

Sbjct: 
Qutry: 
Sbjct: 
Query: 
Sbjct: 



Y APTEETTY A PT EETT Y APT EETTYA PTEET PY E PTEETT YAPTEETT 1131 



612 MP3ePSKHSPSEA\3KHSPSEPJU<H3F3EWHC3PSEMHCSPSE*PJt*SPSEPJUHHSF3 741 

P+e + t+p.E * *+p+e +*p*e ♦ *t + * p+e +p+e **r* 

1239 Y EPTEETTY APT EETTY APT EETT YAPTEETHY API DETTYCPTEETTY APT EATTYAPT 1291 



1299 EET PYAP7EETTYE PTGETTY APT EETTYA PTEETT YAPMEETPYEPAEEST3TV3TCKP 1358 



79> C*.SHSFSEKSKL3PLEfc5ACSF3E 111 

E » P++ + P + P** 
1359 CJfT EE FTDEPT DEPTDE PSDEPTDE PTO 13S£ 



Query: 501 ERTPRGPSEItT1UtNPSHPJtHASFSER3Q«S3lEIUU(H3PtOUHCSPSH»tHSSFSEP.SH 5(1 

E T P+E T ♦ P+ + P+E * ♦ E + + P+ + * P* * *P*E ♦ 
Sbjct i 1075 EETT YAPT EETTYA PTEETT Y APT EETT Y APTEETTY APACETPYEPTEETTYAPT EETT 1134 

Outty; SCI PaP»CA>lHC3PPEMCHSLSCW;U(SPS0PSHRCPS0RAMH3PJERS«AlPSERXHR3PS £21 

♦ P** *P E » * »E **P*» • t** «P+E ♦ t* * *P* 
SbJCt! 1131 YAPTEETHYAPI EETT YC PT EETT Y A PT EATT Y A PTE ETP YAPTEETT YE PTCCTTY A PT 1194 

Query: £22 EAHH*3PSQRSMP,<:PtERSHC3P3EPJUIASPSQX3MRCP3EltKHKSPSKP3KASPAAIISH t)l 

E »P++ * P*E ♦ *P«E F++ * P«E « P+* * +P * 
Sb)Ct: 119S EETT YAPT EETTYA PTEETTY APT ECTPYE PTEETT YAPTLETTYEPTEETT YAPTEETT 1254 

Query: ««2 ASP*EI«HHSPSIJl3HK3M£W^KK3PSEIUMCSPSEASHCSPlEWUilt3«EP.AHHIM 741 

*P«E » **P»E * **P + ♦ P+C * *P«E ♦ *P*E *P*E * P* 
SB]Ct: 12SS Y APT ECTTYArTEETHYAP I rjETTYCPTEtTTYAPTLATT YAPT EETPYAPT EETT YEPT 1314 

Query: 742 EKSKHSPSEKSHMSPIEHKMISPLEASIUISLLCRSHASPSCPJtSHlUrERSHMISERSH £01 
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PCT/I BOO/0 1496 



Ittjet: 

Qoary: 802 3PSEF.SHL3PLCA3KCSPSE 121 

PS+ + P * P+ + 

Sbjet: 137S EPSDEPTDEPTDCPTDIiPTD 1394 

Idantlt 

Ouaty: 114 GLHSPS0PJH1«rSQIUWK3PiEPJM*3PSEASHMP3EMHP.SPS0MHrcP3EWKCS (43 

G + PS ♦ P*+ * P C ♦ + PSE + + P £ ♦ **+• ♦ C++ * 
Sbjct: 299 CCYtPSDITE-APTCCTimWmAAPSIOTTTJIPKVTPYAPItlcPY-CWCrrTlfVT 316 

Query; 644 PSCRIWK3PSWNIWPSCIUUIHSKIl»KlUPMUtSHRSPSEltSHllSPSEXSHItSPSCK 703 

E *P++ P+ER H + + ++ + ♦ + P+E + ++P+E ♦ + + P+E 

SbjCt! 351 - - E E3TY A PTK SETS APTERMM Y AH I CK PC DT E V- - THY A PT E CTT Y APTEETT Y A PT EE 412 

Ouary: 704 PJUSPSERSHCSPSERSHCSPSENUIftSPSEP^KSPSEKSKJISPSCRSHHSFSERRRHS 7*1 

+ *F*E + P+E + *P+C *P+E + + P*EF* ++P-E ♦ +*P»E * 
»bjCt i 413 TT Y A PTEET PYtPTCETTYT PT EETT YAPTEETTY AFT CATTY A PTEETTY A PTCETPY C 471 

Outcy: 7C4 plE»RH3LLEIl3HRSFSEIUUHR5reRS-KMISEP£H3PSEKSHlSPl.ER5RC3FSEP 122 

r t * .+ ♦ • +p*t <<»:*♦ E *++P+E+* P E ♦ +P*E 
Sbjet: 473 PTEETrYAPTKITTYAPTEETT Y AS TEETTY A FTEETT YAPAEETFYE PTEETTY APT EX 532 

Ouary: 113 I^SSSCKTCKSPJlRSMPSPSGMAOCPTSEMKRSSCEKTWISPSWRPGPPSCPJfHCS 112 

* *T **P*E t »P. +E + E T «»P«E P* + 

SbJCt: S33 TTY A PTEETTY APT CETTY A FT EETTY A P AE ET PY E PTEETTY A Ft EETT Y A PTEETH Y A 592 

Ouary: IS 3 PJEJUFJlSFLKECLJIYSrpeEJtP 90 i 

PC* »P »E Yt e f 
Sbjct; 193 p leettyaftce^TTyapaeetf 614 



Quary: 716 PSEJtSHCtPSEIUlXMPSERIUOISP3CKSKHSPlEP.SHH3PSERMHSPLCUAMSU.CP 771 

PS* ♦ + P+E P C »PSC + ++P E + ++P+E* «C * ♦ * E 
SbjCt: 303 PSOETE-APTECTTYVPPXETTAAPSCOTTYAPREVTPYAPTEKPY0--VCETTY-VTEE 1SS 

Ouary: 776 SHR3 PS E RUSH WfTllSHlUn SEAS NSF3EK3KLSPIXPSACSPSEARCKS3S 129 

3 .p., ♦ . n H E+ *.ft+* *P E + +P+E *+ • 

SbjCt: 319 STY A PT P SCTKA PTC AMH Y AH I EEPCDTCVTMY APTCETTYA PT E ETTY APTEETTY A PT 4 IS 

Ouary; S29 GKTC KS P S CPS KRSF5GMAQG1T5 ERSHRS3C CRTftKS P3EWPGRP SGRWHCS PSERS R ISS 

+t * P+E ♦ +P+ +E • • E+T ++P+E P+ P*E ♦ 

SbjCt: 419 EET PYC PTEETTYTPTEETTY APTEETTY A PT ERTT YAPTEETT Y AFTEXTPYEPTEETT 47S 

Ouary: B89 RSP LKEGLK Y3 rpCEKPSH S L3 RD 912 

+p KE Y+ P E *+ + * • 
Sbjct: 479 Y AFT KE - TTY A - PTEETTY ASTEE 500 

Padant inforaution tot or.FXprite*3_lgl 1 , fin* 2 

Report (or DKrtpncai3_lqll.2 



E53LS I rYDREDLVPMECSCOSOSDSgTRI 3ESQHS LKPNYLSQAKTDF3EQFOU.EDLG 

IK 1 AAKLLRSQ1 PP0VPPPLASSLVLKYPICL0CCRCSCLNCHHKL0TT3CPYLLI YPQL 

HLVRTFECHC£VTU*l£riUJUUWQXSXYRERDRFV I RR3FI SF3QKXAM YTOA3KS 
hcecccceccc cc ecc cceaa cceeccccccccccc •■ •••cccccechhhfthhhccccc 

PTSTI Di/OSG P30S PAPVQVY I RRCORS APDLVEK TKTRAPGHY EITOVHKL FE3D3E3T 



WO 01/12659 



PCT/I BOO/0 1 496 



ccccce ceeeccccccc »••« e« • ccee eechhlthh he ecccc •••••« e 

OWEKIUllWllTWrrSMFYFWRITKRMKKRXrrntlRTTIESMMI^I^KIIlCAT 

hhllIl^lh(l(lhhccccccccccchhh^lr^h^hhhhecccecccccccchhhh hhtmhhhhhce 

0TST»SLKK)PKKP»OPKm]l.LroSLK(»«nWlI<VIASVCRI[PV0CTRP»ILir*SKNT 
f PKQHfcKlttCLFSSI KBDKHSAD«.TP*GSTItOCOI UrCCTVOCMAQQPIWATirQP* 

plrlprft dsosc t krorruvca plrtvqkds jsk wchf* mi etssocs kkist pct rv 

0* PICO I L PGS PVK RTtM KM Lit DRtT Ft KEHNX M fK KIATP RCPS E *T WW FSWKMHftS 1 1 
ER*0«S SL EMtHH S PSO*S HC 5 PS RKKH SS PS t*S WM PSQWJHCS P FCASCM S L SCKC L 



RSP3QKS KftCFSOftftHKS PS E»S KM PSIKSH US PBEMUtUPSOMM RGPS EXSHCS PS 

ccceece ececcccceec e cccceccceceeec ce eeeecccccceecccceeceecccc 

EltWRSPSOIlSHRCPSEnilHIISPSKiaHUPMlRSKMPSERSHKSrSEnSHHSPlERXM 



PU*H3Ll,tK3HlUFSEft*SK»rER3HltJU SI WMPSEKSHLJ PLERSKSPStltltCM 



SSSCKTCttSPSEUHUPSCHRWHTSERSHItSSCENTMISPSDaPCRPSaMIHCSPSE 



RStlWPLI<ECLKVirPCEP.PSMSl.SROrW10TTLLCTTHItKPKACOVWIlPCATR 



F30001T «»->««7 »TF_GTP_* POOC0M1T 

<No Pf*a duti *v«ll*til* (or Wiriphtt*3_i»ll.J) 



976 



WO 01/12659 
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9 roup: ccitci ilttli 



pi nxrly Identical to Minn KIAA0I7 



(ind application In itudyino. c 



MAA0O7, alcarniU** apHead 



101 CCTTATGAAA CACTACAGCC CCACCCACTA CGTCAATTGG TTCCAACAGT 

151 ATAAAGTTCG CCAAAAAGCT GGGTTAGAAG CCCCCAACAT TGTAGCCTCG 

201 TTCTCAAACA GGTT X TTTTC AGACCACGTT CCTTCTAATC GCTTCACTG* 

251 CATTCAGAAC CTTGAAGGAC CAGACATTTT TTTTCAGGAT CAACTCCTCT 

101 GTATCC1AAA TATCCAACCA AGAAAAGCTT TCACCTCCAA ATACTACGCA 

111 AAAAAAATTC TTTACTACCT CCGGCAACAC AACATCTTAA ATAATCTTAA 

401 ccccTTtcrr cagcacccac atcactatca ctcctatctt gaaggtgctg 
in tatatattca ccagtactgc aatcctctct cccacatcag cctcaaacac 

301 ATCCAGGCCC AAATTGACAG CATCCTGGAC CTT C TTICCA AAACCCTTCG 
Ml GGGCATAAAC ACTCCCCACC CCACCTTGGC CTTCAACCCA GGTCAATCAT 
(01 CCATGATAAT CGAAATACAA CTCCAGAGCC ACCTCCTGGA TGCCATGAAC 

m tatctccttt acgaccaact gaacttcaag cccaatccaa tggattacta 

701 TAATGCCCTC AACTTATATA TGCATCAWT TTTGATTCGC AGAACAGGAA 

111 TCCCAATCAG CATCTCTCTG CTCTATTTGA CAATTCCTCC CCACTTGGCA 

■ 01 CTCCCACTCG ACCCTCTCAA CTTCCCAAGT CACTTCTTAT TAACCTCCTC 

131 CCAAGGCCCA 6AACGCC C CA CCCTGGACAT CTTTCACIAC ATCTACATAC 

»01 ATGCTTTTCC GAAACGCAAG CAGCTCACAC TCAAACAATG CCACTACTTC 

(51 HTCCCCCAOC ACGTCACTGC ACCACTGTAT CCCCTCCTCA ATGTCAAGAA 

1001 CCTCTTACAG ACAATGGTCC CAAACCTGTT AACCCTGGGG AAGCCGGAAC 

1031 CCATCCACCA CTCATACCAG CTCCTCACAG ACTCCCTCGA TCTCTATCTC 

1101 GCAATGTACC CCCACCAGCT OCACCTTCTC CTCCTCCAM CCACCCTTTA 

1111 CTTCCACCTC CCAATCTGGC CACA&AACTC mC T C T C TT G TTTTCAAGG 

1201 tCCTTCACAT CCTCCAGCAC ATCCAAACCC tACACCCGCG CCACCACCCC 

1131 CCCGTGGGCT ACCTCCTGCA GCACACTCTA CACCACATTC AGCQCAAAAA 

1101 GGAGGAGGTG CCCCTAGACG TGAAGCTGCG CTCCGATCAG AAGCACACAG 

1351 ATGTCTGCTA CTCCATCGGG CTCATTArCA AGCATAAGAG GTATGGCTAT 

1101 AACTCTCTCA TCTACGGCTC CGACCCCACC TGCATGATCG GACACGAGTG 

1431 CATCCGGAAC ATCAACGTCC ACACCCTGCC CCACCCCCAC CACCACCCTT 

1301 TCTATAACCT CCTCCTCGAG CACGGCTCCT CTCCATACGC AGCCCAACAA 

1131 AACTTGCAAT ATAACGTGGA CCCTCAAGAA ATCTCACACC CTGACGTCGG 

1(01 ACOCTATTTC TCAGACTTTA CTCGCACTCA CTACATCCCA AACCCACAGC 

1«31 TCCAGATCCG CTATCCACAA CATCTGGAGT TTCTCTATCA AACCCTGCAO 

It 01 AATATTtACA GTGCAAAGAA AGAGAACATA GATGAGTAAA GTCTA&AGAG 

17S1 CACATTCCAC CTTTGCTGCT CCTCCTArCT TCCAAGAGAA CGGGACTCCC 

1S01 CAACAAGACC TCTCCACCCA CCCCTCGGGA C 

1131 ACTCCACCAC TACTCCTGCT TGI 

HOI TCTTCCCCAC CTGCAAAGAC AATCTTCCTC TCCOCCTACA CTACTOAATT 

1951 AATCTGAAAG GCACTGTCTC ACTOCCATOC CTTCTATGCT TCTCCTCTGG 

2001 TCACAGTTTG TGACATTC*5 TCTTCATGAG GTCTCACAGT CCACCCTCCT 

2051 GTAATCATTC TTTGTATTCA CTCCATTCCC CTGTCTGTCT CCATTTCTCT 

2101 CAGAACATTT CCTTCGCTGG ACAGATGCCG TTATGCATTT CCAATAATTT 

2131 CCTTCTCATT TCTCTGTGCA ACGT G TTCCG TCCCCACTGA CGACTCTCTG 

2J01 TCTTTTTACC CTGAAGTTAG TTCCATATTC AGACCTAAAG TTCTCTGCTA 

2251 TCtTGCCACC ATCTTAGACA TCCACACATT AACAACCTAA TCCTAATTAG 

2301 AATCATTTGA ATTTATTTTT TTCTAATATG TCAAACACAG ATTTCAAGTG 

2151 TTTTATCTTT I I 1 I 1 1 (H A AATTTAAATG GGAATATAAC ACAGTTTTCC 

2401 CTTCCATATT CCTCTCTTCA CTTTATGCAC ATCTCTATAA ATCATTAGTT 

2451 TTCTATTTTA TTACATAAAA TTCTTTTACA AAATGCAAAT AGTGAACTTT 

2501 GTGAATGGAT TTTTCCATAC TCATCTACAA TTCCTCCATT TTAAATCACT 

255 1 ACTTTTAtTT TTTAATTTAA AAAATCTACT TCAGTATCAT CACTACCTCT 

2C01 TACATCAGTC ATGGGTTCTT TTTCtACTCA CACATACAAA TCTCATGTTA 
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Lnfonution tor tttm 



our from 105 bp to 11 3* bp! paptid* ltngih: 54 4 
Cttaoory: known protaln 
Cl»*inc*tion: unel»»ii(i«i 



1 WXHYJPTDYV NMLEEYKVftO KAGMJUtKIV AlFtKRITIC KVPCWCFSDI 
SI ENLEGPEITF CDCLVCIUM ECMALTVKY YAKXILYYLft OOKILNHLKA 
101 rLOOPDDYCS VLCCAVYIDO TCHPLSDtSl. KOI 0*01031 VELVCKTLRG 
111 IKSftXPSLAr KAGE35KIHE lELOSOVLOA myvlydolk ntcwmOTYM 

ioi klmumnovl imtcimw sllyltiaro lcvplepvkf rsHn-uwco 

251 CAECATLDir DYlYtDAFGK C*0LTVKE« YUCQMVTAA LYGWNVKKV 
101 L0WXVCNLL3 LCKREGIDQ3 YOLLAKLDL YLAMYPDQVQ LLLLQAHLYF 

j51 hlghpeksr clvlkvloil qhiotlopgo hcavcylvqh ylemierkke 

401 evcvevkuts dewhtovcyi igllkkjikay cywcviygwd ptomghewi 

451 mwmvhslph cmhopfyhvl vcccscryaa qenleykyep qeismpdvcr 

soi yfscrtcthy iphaclctry peolefvyet vonitsakki nide 

tLASTP hit! 

NO 1LA3TP hit i availabl* 

Ai*rt IIASTP nlti (or OKripnt J, lew J 

TRQ<BLMEMtAa020<l2 1 a«n« : "MAAOiTS"; product: -KIAAOilS prot«in"j 
Hono »*pl«i» mKNA 7or KIAAOITS protain, partUl cdi., N • 1, Scot* - 
2132, ? • l.Ji-JIS 

>TMH»LKDI:Aa020tl3_l otn*: -KIAAOtYi-j product: -K1AAQHS prottln'; hok> 
(•plan* aMHh for KIAAOtU protain. p*rtUl cm. 
Lcnqth - (21 

HSPi: 

Scora - 1131 <««.» biti). Cxpoct - S.Sa-I»S, P - S.i»-I95 
IdantUiat - 5J7/J44 (Sill, Politic* - S3V544 |S>a* > 

Ou.ty: 1 MKHYSPTOYVWWLEEYKVItQKACLEAWIVASFSKRFrSEHVPCWrSOIEIlLECPEirr 60 

KWtYIPTOYVlWLEIYKVMItACLEAHKIVASrSKRrrSDIVfCKCFSDIEltLECPEirr 
Sbjct: IS KKKYSPTOYVWlXIYICVI«ltAGLEAW(IVASFS«rrSCHWPCJKrSDtElfLtOPEirr m 

00* ty: «1 EDCLVC lUmECKtALTWrYTAIHI LYYLHOO* t LmfLKArLOOPOOYtSTtECAlfY 1 00 110 

EDELVC t UmEGRKALTMKYYA*KI LYYLRQOKI LNHLKAFLOQPt>0YE»YLEGAVY I 00 
Sbjct: US EOCLVClWMEClWALWItYYAKltnYYLROOKILllMLItArWQPDOYESYLEGAVYlDO 10* 

Ou*ry; 111 V^PUDISL|I(>IO^0II»IVEl-VCKTLI(CtKS)UIPtUAFXAGESMtIHtICL0S0VL0A 110 

YCHPL301 JLKDIQAOI DSI VtLVCKYtUCWSRHMLArRACESSMIMI ELQJQV1.0A 
MlJCt: 205 YCHPUDI>lJtDigAOt03IVELVCrrillCIMSlUIPlIJ^rKACES3KIHEIEUSOVLM 2*4 

Ouary: 111 NNYVLYOOLKFK^WOYYMALML YMH0VL1 RPTGI PI 3I1SLLYLTI ARQLGVPLEPVMF 1*0 

MH YV L Y DOLKFKGMKHDY YNALNL YMHQVL I R RTG I P I SN3LL Y LT I AROLCV Fl C FVN F 
ibjet: 2*5 MHYVLYOOWt FKGMRMDY YNALKL YMWOVLJ (UtTCl H SM3LLYLT IARQWJVPLEPVM F 324 

Query: 141 PSH FLLRNCQGAEGATLDI FOY I Y I GAFGXCKQLTVKEC EYL I GQHVTAAL YCVWVKKV 100 

F-SK FU.RWCOGKECATLD IFDYIYI DAFGXGKOLTYKEC EYL I CQKVTAALYCWKVKrV 
ibjCC: 3IS PS M FLLRUCQGACGATLO I FDY I Y I DAFCKGKOLTVKCC EYL t CQHVT AALYGVWV K KV 114 
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WO 01/12659 



PCI7I BOO/01496 



SOJCtl J8S MIWVGKLLSLGKRtCIDOSTOt.LltOlLDt.TLAKYPDOVOLLLLCAUYrHLGlHrCK-- 4 41 

Qu*cy: 1C1 C LV LX VLDI LQH I C?t*D PGQKGAV6Y LVQHTLCH t tWt KEEVG V EV KL P3 OEKH ItDVC Y 3 420 

VLD I LQH I QTLUPCOHCA WCT LVQHTLCH1EKX KEEVCVCVKLHSOCKH *OVC YS 
SbJCt; 443 VLDtLOHlOTl^nWHUVCYLtfOHTLCHIEKKKCCVCVCVKLKSOCKMItOVCiri 497 

Quiry) 411 1 GLI KKH X HYGYHCV 1 VGVD PTCW4GH CM t PJHQJVHS L PHGHH 0 PFYW VLVCOGSCPY AA 410 

1GL1MKHK AYGYNC V t VCVD PTOO«H CM I JUMKVHS L PKGKH Q P mtV L.VEDG3Cft Y AA 
SbjCt! 4M tCl.IKItHltl(YCYNCVIIC»IOPTC»»C)(EiriFlMHNV>tiLPHCMHQPrYMVLVEDaSCRY** 1ST 

Qu«ry: 4*1 QCK LCY N V CPQE I S HPDVGP. E rTGTH Y I WIACLE I PY PEDLEFVYETVOH 1 YSA£K C MO 

OWLCY MVEFOEI INPOVCP-YrSEFTCTMY I PNAELEI PY PEOLCfW CTVQNI YSAKKC 
SbjCt! lit Qf»LCYI«VEPCEI JH^DVCKrrSCfTCTHTI PllA£LtI BYPEJLtrVY CTVCWI YSAJtKI *n 

Ouary; 341 HIDE 144 
HI DC 

Sfcjct: (II HI DC (21 

P«d*nt lnfonutlon for DXripht«.3_lgl. iruu 1 
Report tor DKrtphn»J_»g5. J 



1 t3101.II 
I 3.12 

40L| TRCMBL:AB320tt2 1 qcna: -KIAAM7S-I product: "KIAAOOS pro t tin 

I tor KIAA0I79 prottin, partial cdt. 0.0 



NKHYSFTDYVNMLCEYKVItgKAGLCAJtX J VASMKRrrSEHVPCtlCFSDl EMLECPCI rr 

CDdVCIUMCClUCALIVKYYAKir I LYYLPQQK 1 LHHLKArLQQPDDYES Y LECA VI 1 DO 
a ■■•*«••• cec hhhhhhhhhhhh hhtihtihhhMitihhWihhc ccccc* • *ccaa • • •* • 
Y«PLSDlIU(0IOIWID*IViaVCKTU«IMSIU«PSLAn«CE3SKIM:ilL0*0VLO» 

MNYVLYDQLKntCKHICrrMAlJtLYNHQVLI PPTG I P1SK3LLY LT X AP.QLG V PLC PVM F 

bhhhhcecccceeeeehhhtihhh nhhnhn hbhhccccchiihhhhhlihh hhccececcece 

PSXrUJOICOCUGATlDI PDYI YIDAFGKGKOLTVKECEYLI GOHVTAAlYGWjrVWCV 

LO PKVCHLLS LGKKCC1D03 YOLLRDSLDLYLAM Y PDOVQLLLLOAW.Y FHlGIHriKSr 
l^hhhccchhhhhhhteceeecerinhlihlihii^ 

CLVLKVLDILOHUJTLO PCOHU VCY LVOHTLEHI EWTKECVCVE VKLHSMKHRDVC IS 
• Wihhhhhnhhtih hccrcecccchhrMhbh Wihhhhhhhhhh • ««ccccee «•••« 

I CL tKKHKKYCYHCV I YCHDFTCIMCKEM I KMKHVKSL PHGHHQP FYHV L VEDCSC P-Y A* 
cccctiMihhhhe* mcccccccc tihtililihhhhhhccccccccccc***** cccc ••* • 
OCNLEYHVEPOEI SHPUVCRYrjEFTCTHY I PMA 



■ It* d»t* »iil*blt for 0*np)lt»»]_*()i.]) 

■ d#t* tvtilibl* for Dnrxphi«»3^ti)$. 51 



icctihhithtitihhtihhbccccc 
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or.FipJit.tj]_*»lo 



group: noclalc acid « 

protain «Ufi itiong ilailarity Co 

TH« poly (A) -binding protain m»FI bind* to tha Basiangac (■***) J'-poly(A) tail f 
aukaryotlc inMIA* and toqathar with tha polylAI t*U ha» baan iaplieitad in govarnl 
•tabiUcy and tna erinalatlon of mANA. 

Ion of aJUIA translation *nd 



strong ainllarity to polyadanylat a -binding: protain 
(tan* ihllt *t Bp 707-710 
Saquanead by tMdiCanoaU.il 



aignal at po». 2033 



CCCAAACGTC BCGGCTTCTC TCCCTGCCCC CACCCCTGCC GAGAATCAAC 
CCCAOCACCC CCACCTACCC AACGGCCTCC CTCTACCTCC CCCACCTCCA 
CCCCCACCTC ACTGAGGCCA TCCTCTACGA GAACTTCAOC CCCCCACCCC 
CCATCCTCTC CATCCCCATC TOCAGOCACT tCATCACCAO CCCCTCCTCC 
AACTACGCCT ATCTGAACTT CCAGCATACG AAGGACGCGG ACCATGCTCT 
GGACACCATC AATTTTGATG rTATAAAGGO CAAGCCACTA CCCATCATGT 
CCTCTCAGCG TGATCCATCA CTTCCAAAAA CTGGACTGGG CAACATATTC 
CTTAAAAATC TCCATAAGTC CATTAATAAT AAAGCACTGT AtCATACACT 
TTCTGCTTTT CCTAACATCC TTTCGTCTAA CGTCCTTTCT CATCAAAATG 
GTTCCAACGG rTXTCCAtTT CTACACTTTG AGACACACCA AGCAGCTGAA 
ACAGCTATTA AAAAAATCAA CGGAATGCTC CTAAATGGTC GCAAACTATT 
TCTTCCACAA mXAGTCTC CTAAAGAACC ACAACCTCAA CTTOCACCTA 
GGGCAAAAGA CTTCCCCAAT GTTTACATCA ACAATTTTGG ACAACACATG 
GATGATCACC CCCTTAAGCA TCTCTTTCCC AACTTCGGCC CCGCCTTAAG 
TCTGAATTAA TGACCCATCA AACTCCAAAA TCCAAAGGAT TTCCATTTCT 
AAGCTTTCAA AGGCATCAAC ATGCACAGAA AGCTCTAGAT GAGATGAATG 
SAAACCAGCT CAATGCAAAA CAAATTTACG TTCCTCCAGC TCAGAAAAAA 
GTGGAACGGC ACACCCAACT TAAGCGCACA TTTGAACAGA tGAAGCAASA 
TAQGATCACC ACATACCACC TTGTTAATCT TTATGTGAAA AATCTTCATG 
XTGGTATTCA TGATGAACGT CTCCGGAAAC CCTTTTCTCC ATTTCCTACA 
ATCACTACTC CAAAGCTTAT CATGGAACCT GCTCGCAGCA AAGGGTTTGG 
TTTTCTATCT TTCTCCTCCC CACAACAAGC CACTAAACCA CTTACAGAAA 
TCAACCGTAG AATTCTOCCC ACAAAGCCAT TCTATCTAOC TtTAGCTCAG 
CCCAAACAAC AGCGCCACCC TTACCTCACT AACGACTATA TGCAGAGAAT 
GGCAAGTGTA CGAGCTGTGC CCAACCAGCC ACCACCTCCT TCAOCTTACT 
TCATGACAGC TCTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
AGCCAAATTG CTCGACTAAC ACCAAGTCCT CGCTCCACTC CTCAGGGTGC 
CACACCTCAT CCATTCCAAA ATAAGCCCAG TCCTATCCGC CCAGGTCCTC 
CTAGACTACC ATTTACTACT ATGAGACCAC CTTCTTCACA CGTTCCACGA 
GTCATGTCAA CCCACCGTCT TCCTAACACA TCAACACAGA CACTCCCTCC 
ACCTCCTCCA CCTCCTOCTO CTGCTGCAGC TACCCCTCCT CTGCGCACGG 
TTCCACCGTA TAAATATGCT CCCGGAGTTC CCAATCCTCA GCAACATCCT 

aatccacagc cacaasttac aatgcaacac cttgctgttc atgtacaagg 
tcaccaaact ttgactgcct ccaggttgcc atctccccct cctcaaaagc 
-iagctcaa cccctctttc ctcttattca agccatgcac 



CCTA C TCTTG CTOCCAAAAT CACTGGCATC TTCTIGGACA TTCAT AATT C 

ACAA C TTCTT T*t*TCCTCC ACTCTCCACA CTCACTCCCT TCTAACCTTC 

ATCAACCTCT ACCTCTACTA CAACCCCACC AACCTAAAGA COCTACCCAC 

AAACCAGTTA ACACTCCTAC CCCTCTTCCA ACTCTTTAAA ATTCATCACA 

GACCACGAAA ACAAATTTCT CCTTCACCCA ACAAAAATAT CTAAACATCD 
ACAAACTATC G 



BLAST Raaulta 



Entry HSPOLYAB (ron databaai 
Huiun altHA for polyA binding 
Icon - i«Z0, 9 • 0.0a*00. 1 
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WO 01/12659 
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our fro» 707 bp to 1*3* bp/ p*ptlo« Unqth: 410 
C*t*gory: itrong fluidity to known protein 
Claiillleiclon: unaat 

Front* aetirt: mr i no-lit 

MP 1 f 1 1 1 - 1 20 ) 



l urrotscKSK crcrvirtRH eoaomvdem mgkelkgkoi yvcaaokkve 

SI AQTELKATFE OWKODHITUt QfWNLYVWIL DOGIDDCALA KAFSPFCTIT 

ioi sakvhxegga sucrcrvcrs spccatkavt ehkcrivatk plyvalaqrk 

151 CEROAYLTNE YMQRHASVRA VPW0AAPP5G YfMTAVPTTO MHAAYYFPSQ 

101 lARLRPSFIW TAQGAAPKPF QNKP5AIRFG AFAYPrSTMA PASSQVPAVH 

2S1 STQAVAWTST OTVCPAPAAA AAAAATPAVA TV PRY KY AMI VWtM»M» 

301 OPOVTMOOLA VMVOCOETLT A3ALASAFK KOKCPtLGEKL rPLIOMWPT 

311 LACJUTGHLl. EIMSELLYM LCSPC3LRSK VDCAVAVLOA HQAKEATQXA 

401 vnjxTCvrrv 

tLACTF bit* 

> ■LAST* hit* available 

Al*[t HLA3TP 111 CI (Ot DKrtpllttl3_l»JO, frlM Z 

IAjDNHUPa polyadanylata-binding prottln - huaun, M * 1, »COt« - 1931, 
- l,7a-199 

[R:14I71I pcly(A) binding protaln - nouse, H - 1, Scort • IMI, p - 



HEDAOKAVCOWGKELMCKOIYVCHAOKKVEHQTElJJttrt SO 



CI tflKODPITRtOyVMLrVKIIIAIXilDDI^IUUirSPrOTITWKVtWHXIP^KCrarVCFS no 
QKFODKITPT0 VMLYVKNLODCIDDERLM rSPrGTITSAKVttHEGGASKGrGfVCFS 

279 QKKCK>i»:TP<roGVHi.rvKi«u3DGi3D£FiJ(iitrsprcTiTjM:vjo<ECCR3Kcrcrvcrs 3JI 

111 5PtEAT1UlVTB«CPIVATKPLYVALA0IUtEIJWYLTHEYH0PJ(ASVMVPM Q 1M 

S FCCATKA VTEWtCA I VATK PLYVALAQRK EERQA+ LTH + YMQAMAS VRAVPK O 
339 SPEEATICAVTEKIWPIVATFPLmLAOP-ltEEPXJAMLTItOYPJOWMVMVfFNPVIIIPYQ 311 

175 MPPSGYrHTAVPOTQMKJMYYPPSOIKIUItPSPinnAO^PHPniMKPSAIRPCAPRV 234 

Appscrrx a»potqh aa y y p psq* a+ laps ppmtaqcapiph p row t aiap ap* 

199 PAPPSCYnUkAIPQTWIUUYYPPK]VAOLIlP5PnrrAQCMRPHPFCNKKAI*;rAAPRP 4 J* 

23J PF5Tl«PASSQVPRWJTQItVAjnST0T\fGPRP»AAAAAAATPAVPTV»PYI<YAACVRJIP 194 
PF5TMR PASSQV P A VMS TQRV AJfTSTQT +G P A P AAAAAAA TPAVATVPtYKYAAGVRMP 
IbJCt! 459 PF3TKRPAS SQV P A VMS TQRV AMTSTQTMG PAP AAAAAAA "T FA VRTV PQY K Y AAG VRVP 117 

Ou*ry: 291 OQH RM AQPQVTMQQLAVH VQGQETLT AS ALAS APPQK QKQHLGEAL FPL 1 QAMK PTLAGK »4 

OQH NAQPOVTNQO AVHVQCOC LTAS LASAP PO+QKQP(I£E AL FPL t QAMK PTLAGX 
SblCt: Sit QWl^A0VOVrHC>3PAVHVC«EPLTASKLA5APPOC0KQplLCULFTLI0AMHPTLACK 5" 

IIS I TGKLLET OKS ELL YKLE5 PES LAS KVDEA V A VLOAKQAREATQKA VH SATCV PTV 410 

t TGKLLCI DKS ELL+ KLESf ES LAS AVDEA V A V LQAHQAKEA OKAVMSATGVPTV 
}7B ITGKlXtIO>(StLLKKl.rSrCSLAXKVD^V*VTX3*MOAI(tAAC)L>VT<SATCVPTV (JJ 



:•), Exoect • l.Ja-27, I 
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WO 01/12659 



PCT/I BOO/0 1496 



ci*a - 11/1*3 H3»>, Polltivn • 101/163 {«») 

1 LMTDCSGKS KG f £ FV3 FERH EDAOK A VDEMNGK ELNGXQ I YVGRAQKKVE RQTELKRT FX 60 
** DE*G SKG+CFV n t A++A++*H>IG LN *++*VGR * ♦ Eft* EL * 
130 WC DERG - S KG YG FVH FCTQEAAERA 1 EKMttCKLLN DRK V rVGKFK S AKEftEAELGA RAK IRS 

61 0MI(ODIllTllYOVVKl.¥VI(NLO0GI0DEW.RIlArS?rCTlTSfcltV(«-EGGRSl(GrGrVCr 119 
* N+Y+KN * +00ERL+ r P 5 KVW E GtSKGFGIV F 

Itf tr TirVYIKNrGCDHDDERLKDLrep---AI,SVKVHTDC3GKSKCrGrVSr 33 s 

120 SS PEC ATK A VT EHKGRI VfcTKPL YV ALAQftKEEftQA YLTKE YWQ 163 

E*A KAV UOIG* * K *YV AO+R EAQ L +* 0 
236 EKHED»OKAVDEM(IGKELKCKOIYVGRA0KltVEROTELKW(FEO 179 

• 214 (32.1 bit»), £xp«ct - 1.9«-14, V • 1.9«-1< 
(1*» - JO/150 (331), Po»ltlv*i - I7/IJ0 (JIM 

■ KJKOrCrV»rE(WE0AQI«VDEMlGKELMCK0IYVGRAQKKVCRQTELKRTrECHKOC»I 67 

SO RSLGY»YVNrTOPADAERALDTHNrDVIKGKPVRIM«0 RDPSLRKS S6 

61 TFYQWHLYV KNLDDG I DDERLRXAFS P FGT I TS AKVJWEGG RS KGFGFVC TSS FEEATK 127 

V K"*KHLD 10** L f 3 TO I ! KV+ + SKG*GFV T + E A t 
97 * * -GVGN I Ft KMLDM S I DN KALYDT rSA TOti I L3CK WC DENGS KGYGFVH FXTOEAAER 1S3 

121 AVTEMKGRIVATKPLYVALAORKEEROAYL 157 
154 AIEWWGKLLNDRlTVrvCRrKSKKEREAEL 183 

- 120 {11.0 biti), Expaet - 4.R»-04, P - 4.M-04 
tilitt - 30/9* (301). Poiitiv.i • S4/M <S*») 

70 YQVV(«l.rVKNU)OGIDOERLRKArSProTirSAKVH--HECCRSltCFGrVCrS3PEEATK 127 
Y ♦ *LYV + L * + L + FSP G I 3 »V M R3 G* *V r 9 *A * 
I YpMASLYVGDLMPOVTEAKLYEKrSPAGPILSlRTCI«)MITRWLCtAYVHrgOPADAER 67 

120 AVTEKMGRIVATKPLYVAUU3SKEE-ROAYLTKEYHOPJ4 161 
A* MM •+ KP+ * *QR + !(++♦* 

(I MJfTMN rt>V I KCK PVR I KWSGROPS UIKSGVGK I riKWl, 106 



5tid» lanqth: 221 
l»ritY to known protlin 

■ notlfi: RMP_1 (130-146) 



m sssrAYVwro ktki 

101 IFVKNLOKSI KHXALYOTVS AFGKILSCNV VCDEHGSKGY GFVHFETHEA 
1S1 AERAIKKMHG HLLNGHKvrV GQTKSRKERE AELGARAKEF FNVYIKNFGE 
201 DMOOERLKDL rcxrGPALSV H 

0LASTP hi CI 

Altct ft LAST P hiti for OKFZpht«i3_(«10, flue 3 



1:14171* poly(A) binding prottin 



. scot* - 1031, P - 



>SHtS3PROT:PABl_HUHAtf POLYAOENYLATE-RINOINC PROTEIN 1 (POLY [A) BINDING 
PROTEIN I|*[PABP 1) . 
Length * 636 



WO 01/12659 



PCT/IBMI/01496 



Qu*(y; 1 HWrSTrSTFTULrVCDUtrOVTEAMLTCKrSFACriLJIKJCROLITSOSMTATVNrO CO 

HMPS tSIV ASLrVCDLHPDVTtkMLYCXrSPMPILSIK*C1tO*ir S YAYVMrQ 
fbjCt; ' 1 (WPSfcPSYPHASLTfVCOLMPOVTlAHLVWriPWPtLSIWClOniTRPlLCYAYVKrQ *0 



OW»iy: 121 »rCMI WllVVCDEWJKGYCrWKreTHtAAEWlIKKMWMLLNCPKVrVCOFKJMCM HO 

urcNiLic wc disci KCYcrvxrrr eaaepai»wwg*llii Mvrvc+ntjwttM 
tb jet: i!i »raiiLSCKVvcos)wsp(GYcrvKrrroi*»E«*iE»«»>CHi.iJ«D«i(vrvcRrMW(EM no 

Query: 111 AEUyuUUCErPirVYIMrGIWDCEItWtOLPiGKFGFALSV 220 

AELGAMKET HVI I MirCtUHDCEPLItDLrCKrCPALSV 
tbjct; 111 ACLCAAAKErrWVYIlt 



OU*ty: 2 MPITPSYPTA3LYVCDU(P(IVTIAIU.tEKrSPJu:PIUtlltCK0I.ITSC»HYAYVNrQH 61 

•r$ «l * Lt» n g its ♦ ♦ o s * * o 

*b)ct: W 0MLIWSCVG»irilUILOMIW<l«LY[rrf«rc«IL»CPVVCOCIIGSICYGPVKrcigE 1«» 

Qu»ry: (2 TIID-A£HAljmWrDVIKCKPVlllNW-«<^PSl.--PJCSGVaiIfVWILOKJINI(KALYD 117 
♦ A ++ H + A" * L * »***KH * L D 

tbjct: 140 aa era I ekhngkllm dpx vpvgp f*K spkcaeaelgaaaki mt v y i kji rGE£4< DDE alkd 209 
Ou.ry. Ill TV»AraiIL3C>rvyCDO« SKGYGPYH FE^m C*A**a I ^tM*G* t ^LIiG*+ *■ ♦ ♦ 
*bJCt: 2 -° L FCA" FC P ALS VKVMT DE1CKSXGPCFV3 rERKEOAQKAVMMWGKELKCFO I YVCRAQKK 2« 

Qu*ry: 171 kekcaelgapakift kvt 1 « ntocbkdd e w, noire* FG PALS ill 

at* CL ♦ ♦* M*Y*KN * +DDERL* r FC s 

Ibjet: 170 VEI^IXXJUiriWK0DKItPYG<milYVIUILCIOCI0OCP.LAKCnPrGTITS 122 

Scor. - 227 (34.1 bir.il. E-p«ei - «.]••!•. P - i\3t-l» 
Idtntiti** • 57/117 (30»l, ro-Ulv.. • 101/1*7 (14t) 

Qu«iy: 12 SLYVGDLHPDVTEAKLYEKrSPAGPILSIPICPIlLirSCajNrAtVltrOKIKDAEMALOT 71 

• d» ♦ i * r gp u*** o - s ♦ -v-p. *da+ *»d 

JbJCt: l»2 MVYriWreEOtlMEWLMLreKrcPALSVKWTOE-SGKSKCFGrVSrEAHtOAOItAVOC 250 

Ou.ry: 72 twrDVIlCGKPVRIHKMA DPtLP-KMWCXtPVHtmMllMKA 11* 

JbJCt: 251 W^XElJ^KQlTVCRAOICIJVEROTrLKMri^WOOtlTIITOCV-IILYVFXUlOGIKlC* 30 » 

OU»ry: 11 J LYOTVSArCKI L5CICWCDLWG3KG YGrVH rtTKEAAEFAI KWWGMLLMGAKVFVGQnC 174 

L 3 TO I S V* * SKOCFV r • E * «A* »N»G • ♦ ♦ ♦•¥ » 
Sbjctt 310 LMEMP(CTITSAIIVWECGRJWPGnfCrSSPEMTI«VTEJWGPIVATIlPLyVALAO 369 

Ou.ry: 171 SKKCPEAEL 113 

*»ER*A t 
SbJCt: 370 RXCERQAKL 171 

(car* - 100 (ll.i 



OUaiy: I YPT ASLYVGDLM POVTEAM LY CK F3 PAG P I L3 1 R I C ADLI T3G- JStfY A YVWrOKT KOAE (( 

» *LYV *L + * L **rSP CIS** G * • »V r **A 

«>)et: ni Y<x^LrvKMLDociDOCAXAR£rsprc7iTwrv---»»(tiiG»ix[;rcrvcrsspccAT 347 

Ou.ry; (7 HALDTMK POV I KG K PVRIMf SQftDf £ LAXSGVQf I FVKJ1 L 106 
»b)Ct: J4( KA VT EMKG A I V AT K PLY VA LAQAPEE-ftQAN LTWQYI^RII 3!* 



i fat DKt~tphtii3_SalQ. 



• POLY ADEN Y LATE* Pt VDI PROTEIM 1 <POLY<A 



WO 01/11659 
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rUKC*T] 
EKU1-I le- 

niHC*T) 



04.0i.0S urne profiling (J'-end, 3'- 
EJlltSw) le-S4 

30.01 ocganliatlon of eytopl***. 
30.10 nuclear organ! tatlon |I. can 
OS. 04 traniletion {initiation, along* 

04.01.9* other airna-ttanacriptlon *ct 

reapon** IS. 



rltlai 



IS. c 



04.01.04 fttu p 
04, »! 



13- 



claiiKication not yet clear-cut 
01.1* i a combination and an* repair 
03.11 Mloii* IS. Cerevliiae, YHMI 
04. OS. 0 J Brna proceiaitig [ipl icing) 
04.0? in* transport (S. carcvitl**, YOLlUi 
10. 1J oraeniiatloa of cttroaoioaa (tiu 
99 unclatlilled ptotalna |S. etc 

OC.04 protein teieatlng, i or ting and 



lC»li»c) le-IJ 
vlaiea, rCKlSlc) lfll 
». cerevlait*, KHLUSc] 4a-0» 
S. cerevlaiae, YPKllZc] Je-01 
S. cerevlaiae, YHMa««t Je-Ol 
I 3a-01 

• iae, *H*0»s«] 3e-01 

irpi - cr ml »e-oi 

.lira (I. cerevlaiae, rCLOllc] 
'lata*. YCltlSOcl le-06 
invocation IS. care 



] 3e-06 
Y0ft43Zv| 



cpalrl 

niNCATi 

•LOCH J I 

scop] 

WM 

PIKJCH) 
PtKJCN] 

niucM] 



09.01 n 

11.04 dna repair (direct repair, bate a 
IS. caravialaa, m021w| Je-01 
03.01 call growth (S. ceceviait*. 1 
BL00010I Eukeryotle MA-blnding region 
dlixl 4.14.7.1.] Sei-lethal protein 
nuclaua 0.0 
duplication 0.0 
KM binding 0.0 
nucleoli" 2e-0» 
tandem repeat Za-0* 
elngle-atranded binding ]*-0« 
DMA binding St-lJ 
phoaphoprotein fa* 10 
rlbotoo* 3e-0» 
si tech end cion 3e-0l 
alternative ipl Icing 9a- 11 
enloroplaat Za-tl 
tranicrlption regulation Zt-01 
protain bioaynthetie la -01 
nucleoli!! ee-10 

glycine-cicn KKA-blnding protein Ze-0T 
una^igiia r ue .opto rape* -c 



LOW COMPICXITY 



HTOtSCKSKCIXIFVS rLRKE0*0K*VOCKHGKCU*CW3I YVCM0K K VIMT I « MfCO 



KKOOMTnyawMi.ifVKi' lotci ecEWJuuirs »reriTSAitv»#»c«;Rsncfcrvcrss 



cainlng preteinj le-1* 



........... .CEEEECCCTTTTltHMHHHHHTTTTCCCtXCCCCTTTCTTTEttLCTTT 

I CQ P CEATKA VTEMMGR IVATKP1.YVA LAORKtCKOKT LTM ETM0HMAS VM VPHQRAf PSC Y 



ICQ FmAVPOTOUHMyyPrSOIAUKPSPIWTAOCAnrHPfgitKPSAIRPCAPRVpreTIWf' 



SCO ^lsgvrRVKJTQ^vXJrrsTOT^rcP1«r^AAAAAAATf^v^TVP»Y^rAAOVMKW«!ucAO 



EG r<T^Tt«OU*VMVQC^CTLT»SRUkJ»PrW0K»L£t^LrPLIOAK>('^tACXITCJ<LLX 



;EQ 1DHSCLL1M.E3PES LKSK V0CA VKVLQAMOAKCATOKAVHS ATCV PT V 
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> Cor DKnphtas3_lml0.2 



Pfu for 01tfIphCei3_«mo.2 
A rtcognition motif. Uka MM, UD, or RHP domain! 

• I YVCMLMDtTECDLrDl nOrCpl V* t rMMltm*TGK3RGFAFVCrC0 

♦yv+nl* *t»t lr *rs*rc i»s*++m+ e Gns*cr+rv r * 
4 l yv wiow i ode alrm fs pfgt i t s a* - - i^rskg rcrvc rss 

*t*A**A* tMMG+t* *** + V 
1 PC EATKA VTEMNGR I VAT K P LYV 143 

t intonation (or DKFXphtes3_Bnl0, Crux 3 
Report for DKFZphtei3_Bnil0.3 



(MOtlOL] 

1) IPUP 1 

iniMCAT) 

(tTJNCAT] 
[FUNCAT] 

[ rUNCAT ] 
[rUMCAT] 
t FUNCAT] 

( FUNCAT | 

Ie-19 

iniHCATI 

(FUNCATI 

imiKATI 

[fOHCATI 

[FWCATI 

[rUNCAT) 

(rUMCATI 

[FUNCAT 1 

[FUNCAT] 

[FUNCAT] 

[FUKCAT] 

3e-04 

[FUKCAT I 

[BLOCKS) 

[scop] 
(scop i 



2K301.Ct 

e.si 

SNIJ3PR0T:PAB1 HUMAN POLYAOENYLATt-BlttDING PROTEIN 1 (POLY (A) BINDING PROTEl)* 
I. 1«-U3 

04. 05. 01 uiu p ioc« • ling (V-inij, 3'-and procaaaing and arnj degradation) (3. 
, YERlSSw) le-»4 

30.03 organization ot cytoplasa [9. eetevlaiae, YERlMwl la-*4 

05. 04 translation (initiation, elongation and tenelnation) 



30.10 nucleer organization . [S. 
03.19 recombination and dna repeil 
11.04 dna repair (direct repair, I 
[3. cerevlsiae, YrR013«] 1* 24 
04.01.91 other »rna- transcription activitii 



>vlsiee, TES16S-I le-64 
IS. caraviaiaa, YFH02J-J la-Z4 
excision repair and nucleotide excision 

[S. Cllivlllu, YHLOlfiwJ 



IS. 



revls 



trevisiee. YGAlS9c) ) 
e, YCHlSSel la-U 
neviaiae, YGR2S0c] 1 



99 unclassified pi 

04.07 rna transport [S. cerevlsiae, YOL123« HftPl 
30.13 organization of chrewoaow structure 
» cla»*l(lcatlon not yet elaar-cut [S. cara 
03.13 atl^als [S. Ctrevisiae, YKROStv) 2*-0( 
04.99 othar transcription activities [S. cerevlsiae, YBR212i 
03.01 call growth [3. ctceviaiaa, YBB212u] 3a -01 



t-09 

IbJ l«-09 

' a, iCLOllc] »e-09 
Ic) 2e-0« 



08.01 nuelaac transc 
BL0003DB Eufcaryotlc 
BL00900D eacteriopriaga- 



IS. 



Hsiae, Y0R432u| 



dlupO 4,34 

duplication 
Rna binding 
nucleolus 4* 



(S. cerevlsiae, YDM32w) 3a-04 
binding region RNP-1 proteins 
pa RMA poiynkaraia family protaina ai 
3 Sea-lethal protein ( [Orosophil* Mlanoga* 
1.2 U1A protein (human (Hoao sapiens) 6e-24 



[SUPfAMI 
[SUPFAHt 

[SUPrAM] 
[SUPFAHl 



sitochondrion ft* -01 



i aplieing le-li 
chloroplait S«-U 
transcription regulation 3e-0S 
CTr> binding 2e-06 
heliK-destsbilixing protein la-07 



yeast hap! protein 2a-0 
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ISUrrAM) un*i*lan*d rtbonuclanprotaiB r»p«t-eont«ining piot 

[SUrrAM) polyadanylata-biiulirifl protaio It- 113 

|SurrM*l ribonuclaoprotain rapaat hoaoloqy la- 1 12 

|»*0*ITt] *** 1 1 

I •!■**] KHA itcognlUOn until, (aka MM, MD, or MNP douln 

|KW] All Mt* 

[KM] ID " 



DMIIW^tOmAre»IILSCI»VVCDtl<GSK!l«;rvMrtTKEAAtMl KKMWOU.LMGIt 



rrodtt for DHrzptit»]_enlO . 3 
rsooojo is2->i»o putr_i poocooojo 

rfut Tot DKfTpht*O_tal0.3 

MMM^MAKE low rtcognition ontir. laka MM. MO. ot MP domin) 

KNM • I rvCN LPatDtT ECPlr □ I norcp [viirMHiEWaTCIUltcrArVErXD 

Ouaiy 17 LTVCDUtPDVTEMtLTCKFSPHGFILSlKICROLtTtGSlHTKrVNrQH 79 

HIM EEOMIcAIdaiWCintraCRrmv* 

Ou*ty 7s TKo*tH*L(mwrcrvmcnrvmr *■ 

hmm • i won. pHOtTEcoLrDiri0fcpt**irwi ronrau Rcmrvtrtc 

i»v*«.* lo in o ♦ s»c»«fv re* 

Quary IIS irVKITLOKSliniKALrOTVSAFailLSCMWCO-'CltCSWYCFVHFET 1*1 



ECDA* kA I datuftja* TUG* rl RV* 
*E*AE*AI *KMGn»»*CK*» V 
161 KEJUCMIKKMraiLLNCftKVFV 
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pcr/iBixwnm 



DKfTpht*»J_»pl 

group: t**t«j d«rlw*d 

DMTpht*i3_»pl •ncodti • noval ill imlno *eid protein nithout ■inilarit) 
Mo Inlonitln 1LAJT remit*; Mo pr»dicti»» profit*, plu or KOt Botltt 
Th» n ( u prot*in c*n rind tpplictUon In Mitdylno th* iijuiiiiw profit* 



t*qu*nc*a by K*diC*n<mi« 



cccjwxcccc crccctrrocT gcgtcccctg cctcctcccc cctcaccaaa 

ACACTGCCCA TGGCGCAAGG CCGGCAGCCC GACCAACGCC CCCACTCCGC 

CGGCOGCGCC TCCTTCTCCC TCACATGCGT CCAACCATTC CCTAAGCACA 

ATGTTCATTT CTCAACCACA ACACCATTTG CTACCCTTGT GGGAATTATG 

tAATATTTAT TAATATTCAA ACCAACAAAA AGACTGTACT CCAGTCTACT 

AATGCAATTC TGGGCCTCAT GGCAACTAAC ATCCCCTCTC AACTTCTGCC 

rrrrrcTCAC cggaagcta* aacctctcat ctacctatac agctttccac 

GATTCACCAC AAGGACCAAA TTCAAACGCA ACA TTCTC CT CGACTACACT 

tTACrrrcAT tcagtiactg tcgcacctac ctccctactt actcctctct 

CCCACAATTT CAACTCCCCC TTTGGAACTG CCAATCCACT ATCAMTTGT 
GTAACAAATC ACAGCCTGi* ATGCATCTCA ACCAAATCTC TTTTAACCCC 
ATCAACTGGC GCCACCTCTG CTTATCAAGT CCAACTACAG TGAGCCTGTG 
CACCATTCAA AGAAGTAACC AGGAGCATTG TTTCAGAGCA AGCTCGGTGA 
AATTACCTCT ACAACATCCC TCATTTTTTA ATGAAAOGGA TGTCCTTTTC 
TCCCA 



ACCCATTCCC CCCCTGCTAC CCAAACAGGC ACACACTTTC CCCCCGAAAC 

ATGAICTAIA T OC TTT CC TT CACCCGACTA TCCATTGCTG CACTCCAACA 

ACTCACfTCT ACATTGCCTG TCAACACGGT CATCTTTTAA TCATTAATGG 

AGACACCTTC CAAGTGACTC TACTTAATAA GATAGAACAG GAATCGCCAT 

TGGAACACAG AACAAATTTT ATCACTCCAG TAACCTTGGT ATATCAGAAG 

caccccctcc TcccrrcTCC aattgatggc tttctctatt cttttattat 

TAAACATACA AGTTACATCA TCCACCATTT TCTTGAGATt GAAACACCTG 

TAGAACATAT CACAWTt C T CCCAATTATA CACTCTTCC7 GATTCAAACA 

CACAACCCAT CTCTTTATAT CTACACTTTT GGTAACCAGC CAACCTTAAA 

TAAAGTCCTA GATCCTTCTG ATGGGAAATT TCACCCAATT GACTTTATCA 

CACCTGCAAC CCAATACTTC AICACACTTA CATATTCACG GGAAATTTGT 

C TTTOCTCCC TOGAGGATTG TGCTTCTCTA AGCAACATTT ATCTCAATAC 

CCTACCAACC CTTCTCCCTT OCTCTCCATC CTCCCTCTCt GCAGCCGTC6 

CCACGOAGGJk TGGCTCGCTC TACTTCATCA GCGTATATGA TAAOCAAICC 

CCTCACGTCG TGCACAAGGC CTTTCTCTCC GAATCCTCCG TCCAGCACGT 

CGTGTAAGTC CrrrCTGCCT CCACCACCCG CTCCGTGTCA CACCCCTCTG 

TTGAAAATTC TACTCAAGCC ATCCTTTCTT TTAATTTTAA C TTT1ACGTC 

TTTCATTTGT TTTGAATCTT AATATATTCA CACACTTCAA CACTCAAAAS 

CTACAGAGGG CTCTCTACTA AAGTACCCCC CAtACCCACG TCTGTCCTTG 

CAGGCAGCCT GCTACCAATT TCTCATGTCT CTCCTCACAT GTTTtATCCA 

TGAACAAGCA AAACATAATA ACCACTTCTT TTTACTTCIA TCAATCGCCA 

TCATCTCTCT ATACTCTCCC AGGCACTTCT CCTCTATTAA CTCCATGACC 

TAAACACTCT TGTTCTCTCT ATTTCACACC TGACGAAGAT AAGCCACAAG 

GATTTTAAAT AACTTGCTCA ATACTACACA CATACTCAAT CCCAAATCTT 

CCCATTTCAA CCCACCTACT TGGCCTCCAC ACTCACTCCC TTTCCTCTtA 

AAACCACAAA ACTATCTACA ATOCCTCATT TCTTTTTTCA CTTAATCCTA 

TATCTTCCAC AATGTTTTAT ATCCACACAT AAAGACCACC CTCATTATTT 

CTATACCCAC ATAGTATTCC ATTATATGAA TATACTATCA TTTTTTAAAA 

ACCCTATATT AATCAACATT TACAGTATTT CAAAACTTTT CAACCAATAC 

TTTTAACATC ATAATAIACA GACATTAGAT TTCCACTTGT ACCTCCTATC 

ATTATTACTC T TTCTTTTT A ATTTATTATA TTATTACOTA TTAATAAGAA 

CAGACATTTC TATTCTCCTT tACAOCTTCA CATCACTCTA CCTtCTCOCA 

TCTCATCCTC AAAACACCAC TCACAAACCT CTTATTCTTA TCCCTATTAG 

ACAAATTACC CAATTCAOCG TTACACACGt CACCAAAACC ATTGTCCAAG 

ATTACACATT ACACACCTAC CACACTCACC AOCTCCCCCT CCCACTCTCC 

ACTCCCCACC TCCACCACCC TACCTCACTC COCAACCATC CATAACCTCC 

TTCCATTTAC CCCCTCCCTT TCTGCACTCT CATTTIITTC TOCCTTTCCT 

TTCTCACATC CTCTTATTCT AATTTACATC TTCCCACTTT TTCTAATTTC 

ATAAAGTTGT AGACATOTTT CACTACATTC TTCCTCCCAC TCCCAGGTAC 

CACACACAOC CTAATCAAAT OTCACACCCA CCACTAATTT OACAATTCCT 
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FCT/IB0O/DM9* 



3H1 TATTTCCOCT TCAAACATCA ACAAAGCTCT » 



•LAST KctUltl 



Paptid* inrorautlofi (or tt*m* 1 
OUT txom It* bp to ISO* bpj paptida langth! 412 



1 HATK1PCEW AMDRKLKFL lYVYSFPGLT KATKLKGWIL LOYTLUIMY 

ii cctylasysi LFr.rtuu.int wwmilckk iopgmcvnqh «rnpMnwmoL 
ioi CL3SM7VSV trriusNQEM crwutsvixp llsgs rncET DwryQUPK 

IS1 DUYGPVLFL MIAELVGItt kETrKFKDDL YFLLKFTWIC MirTSOtYIC 

201 CEESMLLMtN COTLOVTVLM KIEEttPICO JUtMnSPVTL VYQKECVLM 

251 CIDCrVYiri IKORSTMtCO IT-EIEUPVEH MTFSPMYTVL LIQTDKGIVY 

101 lYTFCKEm RKVLSkCOGK FWICriTPC TOYrMTLTYS CttCVWVLCO 

IS1 CACVSKI tL* TLATVLACCP S3L3AAVCTE OCfVYriSVT 

ILUTP bit! 

a aLASTF Klti available 

Alttt ILA3TP hit* lor D*FIpritaa]_«p7, IraB* I 
a Mart SIASTP hit* tound 

Pedant lnforutlon Ioi 0nripnt*«l_»p1, lm I 



■taper t (or C*rcphta»J_«f>7.J 



MT» I PC EWKF5 DMLUPH TVU rPGLTHrrKLrCVl LL0TTI,L*rSYCGTYL*5YSl 

LPCTXLALtnMU S 1 1 LC KMQ PGHOVDCMJ PH PKNVROU: L33 PSTV 5 VWTI EftSWQEH 

CPMIU VKLPLEDG3 «* CTOW FPQSLPKDL IYGPVLP L3AI ACLVG UJUt F* PKDOL 
hhfihh wieceececccccccceeceececccccccccccaa aeaa ecececcececcccc 

TPLLM PTMHCWT FT30LY I GCE EGKLLKI HCOTLQVT VLMft I CEE3 PLEDRRN F 1 3 PVT L 

icq vyqkecvijuci ccrvYf n i KEiurai coruirjiPvtHKTrxpMTTVLLiaTDKCSVY 



rYTr«EFTUrKVLC*CDGFrC*It>ri TPCTOY rWT LT Y SCE I C WtLEDCACVS K t YU» 
TLATVLACC MSISAAVCTEDG3 VY FISVY0KC3 PQWVKKA njCSSVQMW 



WO 01/12639 



PCT/IB00rtH49* 



D*rtptitt*]_9*25 



■ino »cid p tot tin vitn p*tti*l ilmllttlty to King- 



» pcotain c*n tin* Application in (tudyina th. «*pt«»«10n profit* of t**tl»-ip«cliic 



■lailarity to ilnc linqti proc»in 
Scqu*nc»d by DKfl 



1 CCTCCCCCCO CTTTCGGAGC CCGGCGGCGG CCTGTGGCGC CCCCAGCCCG 

41 cgcccgactg cgcctctttc gaccttcacc gcaaacatgc gtttccctto 

101 CATCCTTTGA AATTCTAAGT TTGGGATCCC CCCCCGCCCG CCTCCCTCTT 
151 CCCCCCCCCC CCTTTTTTCC TTTTTTCCTT TTGCTTTTTT TCCTTTTCTC 

zoi cctcccggtc TccrrrrTC* ctccctcccc ctttatcctc gcccagccct 

211 CCCCCTCCTC CTCAGAAGTC GGGGACCCTC TCGGCCTCCA CC TT C O CO CC 

joi ccacccgcgc cccggccacc atcgcgcgca agcagagcac cccgccccgc 
111 tcccggggcc ccttccccgc cctctccacc catcacagcg ccctcccccc 

401 CCCGGCAGGG GCCCCCCATT TCGGGCACTA CCGGACGCGC ccccctwcc* 
411 TCGGGCTGCG CAGCCGCTCG CTCACCTCGC TCCCAGCCAT CGGCATGGAC 
J01 CCCAGCACGC CCCOCCGGCT OCCCTTTCCC CTCTACACCC CCGCCTCCCC 
ill CCCCACCGCC CACTCCGAGA CCGCGCCCGG CGGCGGACCC TCTCCCTCCG 
401 ACTCCACCTA TGCCCATCCC AATCGTTACC ACCACACGCG CCGCGGTCAC 
til CATAGAGACC CCATCCTCTA CCTGGGCttC CCACCCTCCC TGGCGGATCC 
101 TCTACCTCTC CACATCGCAC CCAGGTCCTT CAGCTCCCAt ACTCCTTTCA 
111 AGTGCCCCAT TTGCTCCAAG TCTGTCGCTT CTGACCACAT CCAAATGCAC 
101 TTTATAATGT CTtTGACCAA ACCTCGCCTC TCCTACAACG ATGATCTCCT 
111 CACTAAAGAC CCCGGTGAGT GTGTCATCTC CCTGGAGGAC CICCTCCACC 
Ml CCCACACGAT AGCCAGGCTC CCCTCCCTGT GCATCf ATCA CAAAAGCTGC 
Sil ATAGACTCCT COTTTCAACT CAACAGATCT TCTCCCGAAC ACCCICCCCA 
1001 CT6ACCTGCG GGCTTCCTTO CTGACTCCTC TCAAACGCAC ACAGCGCCCC 
1011 TGCTCCAGCG ACCAGGCTCA CCGGACCCTG CCGCAGAGCT GAGCTTGCCA 
1101 CACCACCGGG AACAGGGCAC CCCTTCTGCA CTGACTTCCA GATCATGGTT 
llll CTCCCTTCCT CCCTGAGCAC accaaattco atgagagcaa ctttgagaca 
1101 ACAATGAATC AACTGCTATt CTtCCCCTCA CCCCTCACCC CACGAGGGAA 
1211 ACCGCATtTT CM Tir C ATC TTTGAAAOGC ATTCtCGGTC TCTCTTTAAA 



o BLAST r**ult 



o M*dlin* «ntry 



r*ptid* inlaw t ton tor 
» 1001 bp; p«ptid« l*nqth: 



Cl*a»if icatlon: uncL*itif1 



I NGCKOSTAAK SUGPTPCVST DDSAVPPTGG ArHfGHrlrTG CGAMGLRSU 

11 VISVAGMGKD MTAGGVVrC LTTPASKCTC CStRAPCGGG IAS DST TAHG 

101 HCTOCTGCCH HUttXLYLCS HASLAOALFL HIAfKWFSSH SGrKCMCSIl 

111 SVASOGMDW riMCLIKFRl, STHDSVLTKO ACCCVICLCE ILQGETIAftL 
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101 PCLCIYHKIC IDSwreVMM CPCKPAD 

•LAST? hit! 

HO sLAST* hit* avail abla 

Altrt BLASTS hii» (or ORFIpriE«»)_*a22, fraiaa 3 

THEMBL:»rom:3 1 product: -PIWC-HI (lngar protain P.HA2b",r Arabldop»i» 
thai Una *ING-nT Cingar protain AHA2b PWIA, ceaplita ed»., * - I. Seora 
- in, r - J. l*-o* 

T*EMlL:*rt>7l«Ji 1 product: •RItK-HZ fing*r protain NIA3*'; Arabldopaii 
ttiallana «I»C-kJ [lngar protain RHA2a mRMA. coaplata cd»., H - 1. Scot* 
• 112, P - t.«*-0< 

>nai ■Ti1ml.ll": Afjbidmiili thaliana chrcaoaon* 
- 2, Scora -121, 

rabidopai* thaliana, N - 1, 



n T13M.23 • Arabldopai 



Quary: lit »iriU^t>ilWVLTRDMCCVICt«LWCDTI*lllPCLCIY1(lt9CIOS»revVRSCP 212 

IP* t,T D »C Hm * C IPC IYKK CI H »W SCP 

SbJCt: 106 alPSVKITPOHLTHOHSCCTVCMIXriVCGWTtLPCKMIltHRDCIVPWWUIMSCP 2*2 

Padant intonation for DXFIpnt.U_t.I2, traaa ) 

ftaport Cat DKFTpnt**3_>*22.} 

t LENGTH) til 

IHW] 237*2. *2 

(pi) 

IHONOL) HR:T0I2I» hypothatlcal protain Tilt*.!] - Arabldopai • thaliana 2 a -01 

(ruHCATl »» luiclasaltiad protain, I*, caravlaiaa. YDKI13C] «a-0( 

[rVNCATI 10.07 oroanliatlon or andoplawatic ratlculuai [S. cinvlim, YOLO 13c) 

0.001 

rrUMCAT] 01.13 piotaolyiti ti. caravliiaa, YOLODcl 0.001 

tPFAMl tine Cingar, CJKC4 typa tRIHG (lngar) 

IKMt irragular 



Wtt PCCAMt r&HYKTGGCAMGLASRS VIS V ACMCKD 



PST ACCVPTCLTT PAS KGTCM CRAPCGCGS ASMTYAMCMCYOETGCCKn ROCMLl IG S 



3 XOtCV I CLIILL0C3T 1 AALPCLC I T KKSC [ D3W TCVHR5C PCK PAD 

) eeaaaaatccccccccceccccaaaaaacccMihhhWiheccecccc 

> Proalta data available Cor OKriphtail_»a22 . 31 

Ptaa for Wripfita.3 »»22. ] 

1KAMS Hoc lingac, C1HC4 typa t«XG ( lngar) 

i •CMcrcrrgloyrwrraefWilPCgKirCypCirtK ciwc* 

c ic o*+ lpc» **ci ♦* cr* 

try 11* CVIC IXELLQGDT lAALPCLC [ THKJC T DSMrtV*A»Crr.H 
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PCT/T BOO/0 1-196 



group: l«lt«» Miivtd 

DKI?pM*t3_liI0 «newn» • noutl 101 •■lno tcld prottin mth iladlacity to Mmun KIAA033S 

No infotMttv. kLUT niultii Wo pitdictlv* protit*, p(u oi SCOP notif*. 

Th« n*w pro t. in can find *pplicati«i in ttuctying tt* npitiilw profit* or t.itii-tpacif ic 



complata eDNA. coapltt* cdi, CST hit* 
Soquancad by Dxrt 

Locui: /Mp-"44.I c* ficai top of C»rl7 Unkaga group* 



I CTCCCCCAGA TCACCTCGCC ACCTCTSCCT TCAATCCGCA AATACTGATC 
11 AAGCCCCATT TATTCTCCTC TCAGGAACTC TAACTCTACC ACAGAACATG 
101 AGGCGCTAW XCTTCATCAA TCCCTTCCCT CSACCACAAG CAMTTGACC 
Ml ACATTCCCAA CCCACT6*TC AAAATGATAG ATCATOAGGC CTAAAATGAA 
101 TAAGCAAAGA ACACAAGTGG CAGAGGCTGA CAACAGAAAG AGAGGGTGGA 
111 CCCCCTCTA* ATCTTCAAGA TTAGCCTATA ATATGAGTAT »TGGCTAAGA 
301 ATTGGAAGAA TTCTCTAGGA GGCAGTAGTC AAAAAOTACA AGCAGTTTGG 
311 AAGAGTACTT ACAAATATCA ACACCCAGCT GGCTAAAAGG TGGAGCTATA 
401 CGTCATTGAA CC TCAACAAA CTCAGTCTCT AOGGCATTCC TTAACTCATC 
4)1 TGTCTAGACT TCAAA G TTCt CTAGHATCAT AATTCACAAG ACTCATCtCT 
S01 GCCAAAGTCA CACCTTTT-rC ACGACTGAAA ACAACATAGC AAAATAAGCC 
511 AAGATCTCTC TGCATCCAAT CACCTACGAG GCCCAGTTCT TTGGCTTCAC 
Ml GCCACAAACG TGCATGCTTC CGATCTACAT TGCATTTCAA CACTACCTAT 
651 TTGAAGTGAT CCAGCCC6TT CAACAGGTTA TTCTCAAGAA GCTGCATGGC 
701 ATCCCAGACT GTCACATTAG CCCAGTCCAG ATTCGCAAAT GCACACACAA 
751 CTTTCTTTOC TTCATGAAAG CACATTTTCA TAACCTTTTT AGCAAAATCC 
• 01 ACCAACTGTT TTTGCAGCTC ATTTTACGTA TTCCCTCAAA CATCTTGCTT 
■St CCTGAAGATA AATCTAAGGA CACACCTTAT ACTGAGGAAG ATTTTCAGCA 
901 TCTCCAGAAA GAAATTGAAC ACTTACAGCA GAAC7ACAAG ACTCAATTAT 
951 CTTACTAAGCA GGCCCTTCTT CCAGAATTAG AAGAGCAAAA AATTCTTCAG 
1001 GCCAAACTCA AACAGACCTT CACTTTCTTT GATCACCTTC ATAATGTTGG 
1051 CAGAGATCAT CCCACTAGTG ATTTTAGGCA CAGTTTAGTA TCCCtCCTTC 
1101 AGAACTCCAC AAAACTACAG AACATTACAC ACAATGTCGA AAAGCAATCG 
1151 AAACCACTGA AAATATCTTA ATTCCTCACT AGTCAJLAASC ACCACCCTGT 
1201 CAAAAAGTAG AATCATAACG ACTGTTCAAA CCATAAGCAC TCTTCAAATC 
1151 ATACCAGtttA CTCTTCAAAC CAACCATACT TTTTATTAGA ! I lliCI I Jtl 
1301 CAACTCTTTC TTCTATTCTC TG TTTT CCTC TTTTTTCCTC CACTTTGCTG 
1151 AGCTATGAAG TCTACTACTT TGAACTAGGC TGAAGCATCT GACTCTTCTA 
1101 ATAAOTGGGA AGCCATCCAA CAAAGAAGCC ATGACtAfirT AAAGATATTT 
1451 GCAGAGTTAC ACCTTCCTCA TAAGTCCTTT CTCACCTTCA TTATTTTCCC 

uoi TtACTcrrre catgagacca cacaacaaaa ccattaaacc cgtcgctcct 

lill TTAATATTAT TATTATTCTT TTTGAGACAA GCTCCCTTTC TCtCACCCAC 
1(01 CTTACACTAG ATTTCACTCG CACAATCTTG GCTCACTCCA ACCTCTCTCT 
1(51 CCTCCCCTCA ACTCATCCTC CTGCCTCACC CTCCCAACTA GCTAGGACCA 
1701 CACCTCCCTC TCACCATCCT TCCCTAATTT TTTTCCACAA ACCAGGCCTC 
17*1 ACTATATTCT CCACCCTCAC TCCCTCTTTT ATTAACCACT CATTACACTC 
1*01 CCCAACACCC AACATASACT ACTTOCTCTC CTCCTCTCAA TTTTCTTT CA 
1«51 TCACCGAGTC AATATGTACT CCAAACAACC ATOTACCAAA AAACACAACC 
1901 TTCATCTTTA ATAAAAAAGA ACTTGCTTTA TTTCCAAAAT AAAf CCCCTC 
19M ACAAAAAACC TCCTCATCTT AAGCAATTOA CTCTCTTACA GTCCACCACA 
1001 AGACCTTAGA CAAAAAAACC ACAACCXACT GCACTAGAAA AGCAACCATC 
1051 TACCATATAC TCACTAGTGA AATTTAATTT TACTCACTCT TAGCTATCTA 
2101 TGCCAATTtC TTTTCATACT TCACTTCCTT TTOCAATCTC CCTTATACCT 
1151 AATATTTAT7 TATTCACACT CAT. AAGCATC AAATATTTAA TCCCCTCAGT 
1201 CCCAAATTTG TGTTTAAACT CAATGCAATC TAATATTTCT TTATCTCCTT 
2251 ACTCCCTCTA AAATGTTACC TCACCCAACC AAACCGCACA AATAOCAATG 
1301 CTTGTTCCTA ACCTATTGCT TCCCCTCCAT CTCTTCCTAA AGAGCAGAAC 
2311 TTCCACTTTC TCCTTTATCT A6AGAACAA0 TAACTTA5CC TGTATTTCCA 
2401 ATCAAATATT CATACATAfT CAAACCTTCT CTTTACATCA AATATCTTTA 
2411 TTATCAAGAA U t CCI IITTC CAATTCTCTA CATTAAATAT ATCTCITTTA 
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Gntry AC0O414I fro» dtcaban EM1L: 

HOMO uplilil cnroaoioaa II, cton* HCIT324CS, coaf>l*t* taquanca. 
Icbi* - S24S, P - 0.0**M, Idintltici - 10O/104) 
3 axoni 

Entry H)Slt3*l (rw cuttOaia CK»t: 
IWUg 3Tf T1OT-MKIJHI*. 

•cor* - 1005, r - 1.1W», idantitl** - 201/201 

Entry HKOn Iroa diciDua CMSLi 
lunu ITS IHCC-JMH. 
Scott - «S, r - 2.M-J7, iwntitiai - 20V21S 



Ho null In* antry 



rtptlda iniorattlon ror Irui 2 



our I cos 15* bp to lit* bp; poptid* Itnotli: 101 
C*t*oocy: puc*iiv» protain 
Classification: no clua 



l rjvopittyo ttrrerrrorc; huuyiaitjo ylftvmqavt qvilkkloci 

SI PDCOISPVQt MCCTEKIXCr MKCMrONirS KMEQI.rLOl-1 MIMHIIL* 

iai EDKCKcrm ttoroMLOM ieolqekyct elct*oalw eleeqkivo* 

ill kikotltftd euwvchokg Tior*xsi.v» ivouwcloh Turnus* 

201 KLKIS 



ho blast* hits .».iUol* 

Uitt BUSTP hit! tor Mrip&t«n_9i20, (rwa 2 

NiMH MUM tor KIAA013* 0 



>TKCNSUIE1(:KSAB2324 1 (Mi: 
conoltt* «». 

Lanotli - l.MJ 



KIAAOJK-t Niwn aRNA (or KIAA033* «*n* 



KSPt: 



Sbjet: 
Ouary: 

Ouary: 

Sbjct: 



7S/140 IMII 

«s EKru:rmuwrnii.r3KMEOLFU}Lii.KiPi)fiLLrcsi(CKiTPVSEC0 ronw*E wo 

EK CF»«. H tHI. »E0 »L A ILL *0 ♦ * D * L**» 

79* EKEKCriKEH-EN»K.LC0K--ELRDHMILlLL-KOSLMSPSVKHDPL»SVKELEEK 151 

III I CQt^E* - It Y KTELCT K0AI LAEL EEQK 1 VQAK LXGTLT FTOELKHVCWrtCTS DFRES 1, 17* 

IIL» UI K L+A ♦ ♦*!!♦. UtT T +EL *♦ * 1* 
#J2 I[Kt.El^tEKCEKIMKIKLVA-VIUU(KEUUSRK£TOTVKtELCSLRSCI<-'0QLSMM *0t 
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OKrlphtttJ ancodaa • ituvtl 104 •■ino «eiO protain with partial almilaflty to X. lnvli 

taltnin pi5. 

•a InfotMtlv* eLMT rtmUi; »o pradictlv* proait*. ptaa or SCOf notice. 

Th^rww prscaia can find application in •todylnq tha t.piiiilon pcotilt of taatla-apcclf lc 

■ iailailty to C'ta»inut of Itatanin pit) 



iniart l»n<jth: l»7i bp 

Poly A atiatch at poa. 2(«S, no polradanytatlon signal found 



I CTCTCTACGC TCCCQCCCCC TGGTCGTCAG CGCCGACGCT CGGCTCAGGt 

11 GCCCCCGTAC CArGAGGCGC CCGTACrTAA CAGATTATGG CATCAGAAAC 

101 CCACAATGTT AAAAAACCCA ACTTTTCTAA TAAGATTGAG GATCATTTCA 

111 TTGATCTTCC TWMAMW ATCTCTAATT TCACTAATAA GAACATCAAC 

J01 GAGGTTAAGA AATCTCCAAA ACACTTGGCT GCTTACATAA ATAGAACACT 

ISl TGGACAAACT CTCAAAACCC CACATAAACT rCCTAAACTG ATCTATCGCA 

J 91 GAAACAAAGT TCATCATCtC TTTCCAAATC CTTCTTACAG UUMKM 

JS1 TCCCCTCCAA CTGGGGGCTG TCACATGGCA AATAAAGAAA ATGAACTCCC 

<ei TTCTGCAGCC cacctgcctc aaaaattaca ccatgatact ccaacatatt 

til TGGTTAACTC CACTCATTCT GCTTCTTCAC ACACACAAAG CCCAtCATCA 
SOI AAATATAGTG CC flf rtTT t TCAGGTTTCT CAGGACCATG AAACAATGGC 
SSI CCAACTTTTO TTCACCAtKA AtATGAGATT CAATCTAGCT TTAACTTTCT 
Ml GGASAAACAG AAG7ATAACT GAACTT6TAG CTTATTTGTT GAGGATAGAA 
«S1 CATCTTOGCG TTCTGCTAGA TTCCCTTCCT CTCCTCACCA AT1 G I U ACA 
T01 GCAAGAAAAA CAATATATCT CACTTG5CTC CTCTCTTCAC TT G TTG C CT C. 
1S1 TACTAAAGTC ACTACTTAAA AGCAAATTTG AACAAT»TCT TATACTTOGT 
*0l TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGCtCAG AACTATCATC 
ISL CAAAACAGAA ATTATAAATG ATCGAAATAT TCAAATTTTA AAACAACAAT 
*01 TAAGTCGATT ATGGCAACAG GAAAACCATC TTACTTTGGT TCCAGGATAT 
• SI ACTGGTAATA TAGCTAAGCA TCTAGATGCT TATTTA1TAC AGTTACATTG 
1001 ACACArrrCA TCTACTAAAG ACCATTTGGT TTtTCAAAAC ATCCCTGAAC 
10S1 TGTATAATTT ACAAAAAAAA AACTCTCCTC TCAGAACTGT GAACTGTGCA 
1101 ACAAATCAAA ACTATTTTTT CTTTTAAAAA OCCACGTAAT GAAACCACTA 
USt ATCAAATCCC AGCAATCTCC TTCACATTGA AGTCCAAAAA TATCCAAAAG 
1Z01 CAGCAGCTTC AATTTCATTG AGGTCAAAGT GCACTATCA* GATTGTTCAC 
13S1 CTTTGCTGCA TTTGGGAGTT ATATCCTTAT TTGGTAACAT TAAOAACTAC 
1301 TCCATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATCTCAAAA 
I3S1 AATAACACAC CACTTACCAC TASCAACCAC CAACACCAAT CATCATTAAC 
HOI TTTTTTAACA TTCT GM ITA TTAAAAAAAA AAAACACTTA AATGTGTCCA 
HSl CCTATTTTCT TATGTTGAAA ACACTGAAAC TTTAAAACAT GAAAAAAATC 
I S01 AATATTAAAC ATTTTrrCTT CACACTGAGA TACTCTCTAT GTAAAATGCC 
IS5I TTAATTATTA ATAACCC.-.XT CTCTTATGAT ACCAATATCT CTTTTAAAAA 
1W1 ACTAAAACCA ACCATGCTTC TGGCATCATA AAATCAIttCA ATTAAATCAC 
ItSl CCCTTTACAT TCTTCTACAC TCTTCTT6AA ACACTCTCTC CACCATTTTT 
1701 AAAACTTCAC AATACTTTTA CTATCTCTCA TATItTt l CC CACAATCATC 
HSl ATCTCATCTA TCAATCTCTT ATCCCTATCT AACCAAAAAC CTCAATATGT 
1101 TTTTCTATCA ATCTTTAACT CCAAATCTCC ATCCACTTCC CTAATTTATA 
ItSl TTTACTTTTT ATTCTACATA CATTTCTAAT ATTTTTCATT CCTCTATCAT 
1901 TTAAACTTCC TTCATTTGAC TAAATTCACT AAATATTTCT ATTTTTTTCC 

mi m r riAAAT tctcatttta tatcaattct aattcttttt cactacatat 
1001 CTTTTAAACA cttacataca gtcatttaca atcctttaca cttaatcctg 

1031 ATCTTCTATT TTAAATTCCA ACACTTTCTC TCACTACCTC CTCTAATCCT 
1101 TACTATCATA TCCTACCACA CTCTATCACC TCTTTTTTTA AAATACCACT 
I1S1 TTTACTCICA CTCAACCAAA TTCTCCAATC TCTtAACAGC TCTAAATCTT 
"01 ACTTCTCTTC AAMTCATTS CCCTTTAATA CCACTCCTGC TCCTTCACAC 
Z1SI ATCATCCCAT CCTTAATATO CCTCACA6CC ATCTCAOCAA ACCTTTTTAC 
110 1 TAATTCAA7T TCTCTGCAGT ACTCCTTCA* CCACTTCAAT 6TAAACCTTT 
!)M ACCATTTATT CGTTTAATCA Ct ACT CAT AC GAATCTCAAC CACATTTCTT 
2*01 CCTCTTAAAA CTTATGTTTC ACTCACTTCT OSTTTTCTCT ACCTATATTT 
HSl TAIATAGCTA CATATTCCTC ACACTGAACA TCAATTCTAA TAATTCGTTA 
ISOl TTTCCTTAAC TCTTTACATT ATAATAATTT CAGATTATTG CACCTCTCTO 
ISSt ATTTGACAGG TGAGTTATTT AACACCCCAG TTTTCAGGAC ATGGCAATTT 
1(01 CAATTGTAAA CCTGTTATCT CTCTGAAACT TTTAAMTGA TAAAATATAA 
2*S1 CCTTTCTTTC TGCTTAAAAA AAAAAA 
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KAirTNMVm RNrCKKItSH riOLNtXJCtl XrTWMWKEV KKSHCOLAA* 
IXHTVGQTVT (HUXMVIt UtKJCVKMrrf NKTRMOSr GIGGCOKUtF. 

ckclacaghi. rexLHHDirr tlvwssewcj, igrnmiiv terrsevtoo 

MCTHKOVLrS MOUUJIVkLT nWKJUtSEL VAYLLK1EBL CVWDCLFVL 
TKCLQEEKQT ISUSCCVBLL rLVKSLWSK rttrVIVCUl HLQAVHtMW 
1EL3SKTEII WOCWiaiWO OLSCUTIOeX HLTLVKYTG KIMDVDAIL 



Al*rt ILASTP hiti far OXnpht»» J_»kII, liui 1 
TMJ11L:»rOSMlIl_l product: -ptO kit»nln-i i*nopu>^ !*•*>* p«0 

T»E)«L:Ar05iO2 I product: -k*t«»ln plO lubunlfi Mono upic 
plO lubunit MtMA, coapltt* cdj.. N - 1. leer* - ISO, T - 1.1* 

t«DOL:*r05i*J)_l product: -tjt*nin p«0 lubunlt": stronayloc 
>iRHfflL:»r«)!«?.l product: -**t*nln plO iubunit-j Hoao »«pi 

H3»ll 
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fft c |fijm: 

I. An tuetnbUte, compriiini M (cut om nucleic icid molecule having the 
tcqucncc of ■ clone nlccied from the froup comiuinj of: hftrr2_l6el4: hfbrt_l6f21; 
hfbr2_16|l!: hlbrt,l6ii2: Mbr2_l6k22; hlbr2_l6U2; Mbr2_22m: Mbr2_22hl3; 
Mbr2_22hl3: Mbr2_22i4; hftr2_22k3; Mbr2_22kS; Mbr2_23bI0; bfbr2_23b2l: 
htbrtUtt: Mbr2_23i24; hfbr2_13nl6; hftr2_23o24; hlhrl_23oJ; hflrt 2a2; fcftr2_2b!7: 
hft»2_2bJ: Mbr2_2cl; hibr2_2cl7; hftf2_2cll; hfhr2_2dlJ; Mbr2_2dl7; hfbf2_2d20; 
h(br2_2ill; Mbr2_2hl : Wbr2_2hl£>; hfbfl_2il7; Mbr2_2kH; hlbr2_2k]9; Wbf2_3bl6; 
Mbr2_3cl»: Mhr3_3fl6; hfcr2_3(B; hfbrt_3U; hflw2_41mlS: bfcr2_62bll; bfbr2_62fl0; 
hlbr2_62U9; MbrtWnlO: Mt>r2_62ol7; Mbr2_64*li; hfbd>*l3: Wbrt64el6; 
hft*2_Wc4; bftrt_64h6; Mbr2_64i20; Mt«2_64jll; hlbr2_64k24; Mbr2_64oI6; 
Mbr2_6iJ7; Mbr2_6b24, hfbr2_«i20; Wta2_6ol7; MbiT 71o20; hftrt_7ZbU: 
Wbr2_72dl3; hfbr2_72U2: bftr2_72ml6; hlbr2_72nl2; Wbc2_7Sc24; Mbr2_7M13; 
Wbr2_7Sk24; hlbr2.7*n23: Mbr2_7»24: Wbrt_7e22; blbr2_7j4; hftr2_82c2fc 
hfbrljtett; Mbr2_S2el7: hfcfl J0el7; hfbr2_S2c4;; Mbrljtto*; Mbc2_B2il4;; 
hfbrl JOjH; Wbrt_82il7;; WbrlJO; hftr2_S2i24 :: hftrlJO; hfbr2J2ml6:: hftrlJO: 
Wbr2_!2m6;; MbrMO; hfW2_lj9; WM2_24ilJ; h/M2J4blJ; hflcd2_24e23; 
hfkd2_24n20; MWI_24p5; Mkil2,3il3: Mkd2_3ol7: WM2_46i6; bfkd2_46bl0; 
MfaQ_46<i]3; hfkd2,46j20; Mkd2_46kI9; bftd2_46ni4; hflcd2_47»4; bik<n_4b6; 
hftd2_4cB; Mfcd2_4kU: Mfa£2_4nill; hraefljill: Iuncn_lc23; hmcfl_lelS: 
bmcfl_1 ( 13; blw3_ln3; hte»3J4tJ; hw3J4h21 : bw3J4pl4: httU_14p7; 
hte*3_l Jtl3: Htei3_13c24; taei3_lSe6: hteriJJfM; hte»3_15bl; hietfJJiJ: 
hw3_UjI8; Hw3JSj3; h«3_lJkll: tae*3_l7flO; htt*3_17tl7: hteiJJ7nU; 
hw3_l7nia: Hte*3_18f3; hte*3_lSI7: htetf J9T19; htei3_19j17; htd3Jel; hteOJ|l3: 
ta3_lkll; lue*3_20e21; tac*3_2*2: btei3_20rolS; bw3_21<M: ate*3_2ljl5; 
tae»3_:iH6; htei3_2ln23 : bto3_Z2c23; bto3_22f2; htet3_22nl3; bte*3_23lll: 
hm3_23al9: Hle*3_23nl9; hta3_26f22; hte*3_27<Jl; httO_27k4; hm3_27ol4; 
h«e*3_2Ml4; btn3_2ill; bei3_2»l7; bM3_2dl3: bta3_2cl2; taeUJfU; taei3_2j7; 
hKi3_2hl; hiet3_2fcl5; htet3_21l9; Iuei3_2nil»; hta3_2m20; tnci3 2n9: htet3_2ol3; 
bw3_30f4; HtaJ_35b4; hte»3_3JbJ; h»3_33e2l; htei3_3J|6: tae*3_Wkl6: 
h«et3_3Jfc24; hte*3_35nl2; htet3_33n24; h«eri_3Sn9; h(e»3_33pl7; hw*3_35p22: 
tae*3_4b4; htta3_4n7; bte*3_4rj; hte»3_4b6; bteU_4o19; htei3 50)4; taa3_5(WD6; 
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ted3_50n23; htn3_6l>2l: hw3_6cll: 0tet3_6dl6; htes3_72kl]; Hte»3_72kl3; 
hte*3_73pl6: hiEi3_TbI2: hte»3_7dl7; hte*3_7j3; btet3_7jS; hte»3_7pl0; hle*3_7p9; 
b»n3_8e24 : me*3_!gll; Hte»3_lg3: bte»3_SralO: Hm3_8p7 : Httt3_9c22; Hw3_«20; 
Hici3_%22; butelJ7i7; hutel_l8cl2; buieMI119; hutel_lSi4; hutalJSIl; 
but£l_19fl9: hwtl_19gl9; bwel_19g22; huteM5*]7; hutcl_19jll; hutel_li2; 
hutel_20bl9; hutel_20»21; hutel_20h!3i hutel_20mll: hutel_20ro24; butel_21dl3; 
bwsl_22d2; hu*rl_22*l2; buttl_22n2; butel_22o2; butel_13el3: hutel_23gll; 
btuel_24cl9; hu«el_24ell; butel_24j6: hutel_2h3; their compicmcxKi: *nd varianti 
thereof. 

2. Aa tucmbtigc. coraprijinj u Itut one nucleic acid cw te c u te having the 
sequence oft clone teleued from the group com tu lag of: hfbr2_l6cl6; hfbr2_16f2l; 
Mtaljogll; hfbr2_16il2: Mbr2_16k22: bibr2J6ll2; Mbr2_22t2l; hlbr2_22hl3; 
hfcr2_22hl3: Wbf2_22i4; Mbr2_22k3: hfcr2_22kS; hftrJ_23blO; Mbr2_23Wl: 
blbr2_23n; hfbr?_23t24; ; Mbr2_23nl6; hfbr2_23o24; hAr2_I3oS; bftrt_2i2; 
hfbrt_2bl7; hfbf2_Zb3; hfbr2_2cl; Wbr2_2cl7; hlbr2_2ell; hflw2_2dl3; Mbr2_2dl7; 
hfbr2_2d20; hlhf2 2(U: hfbr2_2hl; hfbr2_2MO; blbr2_2il7; hfbf2_2k)4; hft>r2_2kl9; 
hfbr2_3tl«; hfbr2_3fl6; hfbrt_3gl; hfbr2_3t2: hfbr2_4lml3; hfbr2_62blt; hfbr2_62nO: 
bfbr2_62119: UbtI_«nlO: hfcr2_62ol7; h»brt_64iii; bfbr2_64e]3: Mbr2 64cl6; 
Mbr2_64c4; hfbr2_6*h5; Mbr2_64i20; hfbr2_64jlS: hfbf2_6«k24; hfbr2_64ol6; 
hfbr2_6al7: Ubrt_6hJ4; bibr2_6i20; Mbr26ol7: hfbr2_7lo20; hftr2_72b»; 
Mbr2_72d13; h(hr2_72ll2: Wbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_7Sdl3; 
bibr2_7Bk24; bftr2_71n23; bfbr2_7»24; Mbr2_7e22; Whr2_7j4; Wbr2_B2c20; 
Mbrl_10c20; Mbr2_S2el7; Mbrl_I0el7: Mbr2_B2e4: KfbrlJ0e4; hfbr2_S2gM; 
Mbrl_l0gl4; hlbr2_S2il7 : hftrl_10; hfbr2_e2i24; hfbrl_10: bfbr2_S2ml6; hftrI_ICh 
hfbr2_82m6L hfbrl_10; their eomptemertti; and varianti thereof. 

3. An HKnblige, comprising i least one nucleic acid molecule having the 
sequence of ■ clone (elected from the group corauting of: hfbr2_l6GI ; hfbfl_16t22; 
hftx2_22Gl; Ubr2_22hl3; hlbr2_22i4; Mbr2_22U; hfbr2_22kl; Mbr2_2]r2; ; Mbr2J3o24; 
hlbr2_23oS; hfbr2 2i2: hfbr2_2e I ; hlbr2_2ell; Whr2_2d20; hftc2_2g 1 1: Mbr2_2hl ; 
hlbr2_2h 10; hfbr2_2k19; Mbr2jn*; hfbr 2,312; hftr2_62nl0; Mbr2jS4al 1 ; hjbr2_Wel 6; 
hfbr2_64c4; hfhr2_64h6: nfbr2_64i2Q; Mbr2_64ol6; hlbr2_«*l7; hibrt_6i20; Wbr2_7lo20; 

1013 



t 



VtOtl/IMM fCT/IBUQ/Oim 

Wbrl_7MI3: hfbr2_72inl6; hfbr2_72n12; hfbr2_7tdl]; hftr2_7I(03; hfbr2_7a24; 
hfbr2_7c22; httxlj}*; hfbrZ_»2ml6; ind hfbrl JO. 



4. An raemblift. compratng tt least one nucleic icid molenite hiving the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_I4»lJ; 
hfkd2_24oI5; bfld2_2*e23; hftoJI_24a20; bftd2,24pj; hifcd2_3il3; bftan_3ol7; 
hfkd2_46»6: hfU2_46blO: Wkd2 46dl3; hfkd2_46j20: hfkd2_46k)9; hftd2_46m4; 
Mkd2_47a4; bIM2_*b6. bftd2_4cl; hftd2_4kM: hftd2_4mll; their coenpkmena; and 
varum* [hereof. 

J. An anr-robiage. eotnpraing at lent one nudck acid mokcuJe having the 
sequence of ■ clone selected bom the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfW2_46a6; hfkd2_46b 1 0; hftd2 _46d 1 3; hftd2.4b6; hfiJJjtcl ; rjrir compkawflfj; and 
variinti (hereof. 

6, An auemUage, comprising at least one nucleic acid molecule having the 
sequence of a clone •elected from the group comistiraj of: hwcfl_Ul 1 : hmcn_lc23; 
hmcfllell; ttmcfl l|13; their compkmftm; and variaraa thereof. 

T. An assemblage, comprising at lean one nuelek acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcf1_lgl3: their 

I. An assemblage, corapruing at leait one nucleic acid molecule having the 
ae^uenca of a clone aeiectcd from the troop confuting of: bbtt*3_ln3; btea3J4g5; 
hteaJ_14hII; □te*3_[4pM; btes3_14p7; nte»3_15al3: Htei3_l5e24: bfes3_13c6i 
bte*3_lJgl4;btetf_l3nl: meU_15i5; hteriJSjll; Hte*3_l5j3: btet3_Ukn : 
hte*J_l7ri0; htes3_l7N7; htes3J7nl2; htes3_l7nlS; Hte*3_lBf3; hteUJSP; 
btes3_l9fl9; htea3.t9jl7; bte*3_lei: b**3_lgl3: tad.lkll; hte*3_20c2l: hte»3J0fc2; 
ntet3_20mU; htt*3_21d4; ttalJ\jlS. hte»3_HlI6; htes3_21»23; htdJ 22c23: 
htet3_22g2; htes3_Z2nl3: hK*3_23lll; hies3_13nl9; Htes3_23nl9: hte»3_26f22; 
tate*3_27dl; bte*3_27k4; bte*3_27ol4; htrs3.2Wl4; hte*3_2all: btes3J»17; htet3_2dl5 : 
hte*3_2el2; htes3,2n4 : htE*3_2g7; tae*3_2hl: htetf^ZblJ; htei3_21I9; hte»3_2inll: 
htes3_2mIO; hte*3_2n9; bte*3_2oO; toei3_Jpf4; Htes3_35W; hte»3_33h3; bte»3_3Je2I; 
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btei3_3J(6: htei3_33il6: bte»3_35k24 : tae»3_33ol2: nteriJJoW; hte*3_3So9; 
hte*3_3Jpl7; hta3_3Jp22; htei3_4b4; bw3_4fl7; htei3_4f5; hte*3_4h6; hw3_4oI9; 
bcc.3 50j4. r*e»3_50n06; btnJ_50n23, bte*3_6b21; hea3_6cll; «e»3_6<)16; ht«3_72kl I; 
Hte»3_72kl5; r*33_72pl6: htei3_7b22; bted_7dl7; hUtiJIH; htctfTjS; bte*3_7plO; 
b*i3_7p9; btei3_ie24 ; Ha*3_»|ll: Htt3_8|S; hteO.talO: tteO_<p7; Hte*3>22; 
Hte»3_9i20; Hia3_9k22: iheircomptanenu; end virimu thereof. 

9. An tuemblige, compriiing it k*tl one Buckie tc'xS motocuit hiving the 
tequenee of ■ done .elected from the (roup conutmg of: bte*3J4g5; htei3J4pI4; 
hte»3J4p7;h««3_IS»13; hin3_15(l4; Me*3_ISl>l;htei3J5jlt; hlcj3J7jlO; HtedJIO; 
biei3J9fi9; hte»3_l9jl7;htei3,Mc21; hteU_2lnI3; bte»3_22e23; hu»3_22nl3; 
Hte*3_23nl9; hted_27ol4; hle»3_2MI4; nie»3_2»! 1; htei3_2dl3: hte»3_2fl4; huCJgli 
htc*3_Ih)3; hteU_2H9; bte»3_2m20; hteU_2n9; hto3_30f4; tae»3_33(6: bieU J5n24; 
hta3_35pl7; ht«J_4b4; hte»3_4n7; h»ei3_4ol9; tect3_SOj4; hte»3_5&i23; htc»3_50nO6; 
htei3.6b21; h»3_6dl6;hte»3_7Zkll; bta3_7<ll7; bted_7jl: H«a3_l(l I; Hm3Jt5; 
Hle.3_tp7; Htei3>22; Ita3_9i20; Htei3_9k22; their comptontnti; mx) viriinti thereof. 

10. A>. UMtnbtate. compriiint it lets one nucleic «ckt molecule hiving the 
•equerc of ■ clone telected from the (roup comiuins of: Ubr2_l6f IS: bfbr2_2kl4; 
H**3_35W; htet3_35p22: we*3,7j3; htei3_7plO-. hwelJOml 1; their c o mpt emenu ; and 
viriinti thereof. 

11. An mcmblije. comprainj «t law one nucleic icid molecule Mvint the 
tequeocc of ■ clone idected from Ac (roup conutinf of: hfbrt I6cl6; Mbr2_2h3; 
htet3_I3i3: htei3_un; maSJkll; Itei3_72kl3: btei3_7b22; hutel J9g22; hute1_24j6: 
their cm t y tetB eatt: tfld vtrieoti thereof. 

12. An wanblife, compriiint «l kut oae nucleic ecid moleciile raving the 
icqucnce of i clone elected from [he jroup commmj of: hftrt.MIJ; htei3_35e2l; 
hutel_2hj; their comptemenu; tad vtriinu thereof. 

13. An memblife, comprainj >l (cm one nucleic acid molecule tuvini the 
•emtence of ■ clone telected from the prop coroutine of: hfbr2_23t24; hfbr2.2)17: 
hibrt_41mlj; hfbr2,62n0; nfbrl_«2ll9; hfbr2_64jl«; 
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bfkd2_I4«20: bAd2_24p3; hfkd2 *kM; htc*3_lf13: (tteU_IU16; htct3_23IU: 
hte*J_26|22; hte*3_4h6; ht«3_72pl6; hutel_19bl7; hutellttiU; hutc]_24c] 1 : their 

14. Aa tuembUfe, compruinj it lea* one nucleic acid molecule bivirtt Hie 
lenience of* clone iclcctcd from the group conrauni of: Wbf2_3g8, hftr2_62ol7; 
hfbr2_c*I4; hfbr2_7Stt4: cflnd_24til3; MiiC 3olT; WluQ 46j20; btoJJ7117; 
htei3J7olS; htet3_27dl; hie»3_2al7; ht«3JJb5: htei3_3Jklo; hte»3_3Si>12; 
htn3_35rf>; bwet_20il9: hu*l_20m24; butel,23ell: their c o mptan c m ; ud variant* 

13. An uxnMiie, compratot M low ore nucleic tcid molecule bavins che 
•equence of a clone telected from the (roup corattunj of: Wbrt.ZJblO; hfbr2_3clS; 
hfbr2_64alS; hfbrt6ol7; htbrl_72b1!: hfbr2_T2ll2; Ubt2_l2i24{Ubrl_l0l ; 
wa3_l4h21: Hau3_lSj3; tnaSJOmli; btta3_Z2ii; btei3_2ffllS; Wa3_7p9; 
htei3_«ml0; butelJStl: their comptemenu; and viriinu thereof. 

16. An uxmbbgc, eoenpraiof at lea* one nucleic acid motccuk having the 
Kquencc of ■ clone lelected from me (roup coniitin| of: hlbr2_23b21; Mbr2_23nlo; 
nftr2_2cl7; hnx2_62b11; bfcr2_7!cM: hlbr2_»2e4 <hfbrl_l0c4); Ulbr2_t2jl7 
CbftrlJO): nfbr2_l2m6 (Mbrl J0) L Mkd2j46n>4; hieU_15k]I; teei3_lel; hhte*3_ln3; 
htc>3_20U; hte*3_21d4; htei3_23n]9; Bte*3_4£J; hte*3_6cll: hte*3_St24; btttel.Wiil: 
huttl_22<E. huttl_Z2el2: their compl em en t and vuirmi thereof. 

17. An auanMat*. cornpraist at kau one nucleic acid molecule bavins me 
■equence of ■ done adected From the poop coounint of: nftr2_l6tI2: hfbr2J«12: 
ttfbr2_22M3: bfbr2_2617; Wbr2_2dl7; Wbr2_641i24 : hfbr2_82c20 (hfbrl_IOc20); 
hfbt2_S2tl7 <hftrl_l0el7); hfbr2_82|l4 (hfbrl_I0(14>; hlk«I2_24al3: hfW2_3iI3: 
h/kd2_4n>ll; bmcfl.lall: hmcfljelj; taee3_ISc6: tae*3_2oB; hte*3_27M; h*e*3Jhl; 
Bttt3_3Jk24; hntel J9fl9: and huttl^24cl9; iheir compleroeott: and variant thereof. 

IB. An anmblafe, comprfaias at least one mcteic acid molecule bavins the 
aequence of a done adected from the froup corautinj of: bJk<£2_46kl9; hftd2_47a4; 
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Iuc*3_2cl2: hte*3_2ljl3; hte*3_l7nl2; huirl_1SII9: hutel_lL2; their complcmenu; and 

19. Aa utetnblatc. compriiinf at least one nucleic tod molecule havim the 
■cqucnce of ■ clone iclecied from cbe (roup eoraiMing of: huie I_l7k7; butel_IScl2; 
butcI_lSil9; huttllW; outcllSII; butel_l9fi9; butcl _I9|19; butel J9|22: 
lw»el_19hl7 : huiel_19jll; bu*l_lil; I««l_20bl9; butelJOfll; butslJOhlS; 
butel_20mll: hutel_20m24; hwel_21dl3; hutel_22<tti hut«1_2Ie]2: buteI_J2fl2; 
buul_22o2: butel_23«t3; b«el_23tll; hu*l_24cl9; hutel_24ell: butel_24j6: 

bute I^IhJ; their cotuplemcnta; and variinti thereof. 

20. An a»emblage, compriiinj n le«t one nucleic Kid molecule hiving the 
Kqucrtte of aelone selected from die group coouKing of: fcutelJTkT; hut* I _IIcl 2; 
hotel,! Ml huttl J9gI9; butel_l9jll : huie1_22ii2; hwel_2ldl J: twte]_22o2; 
hutel _23(l I ; their co rop kmcnti ; tnd variant thereof. 

21. A computer rradibte medium. et>mpruinj in electronic form M lewi one 
nucleic acid or protein Mquenx of ■ clooc •elected from the poup coouvinf of: 
Whr2_I6e16; hfbr2_16Gl; nfbr2_l6tU; hfbr2_l6«12: hfbr2_16k22; Ubfl.tMiI: 
biV2_22f2l; hfbr2_2Ilil3; hfbrt_22J>13; hfbrt.IIM; hfbt222kJ; hftr2_Z2M; 
hfbr2_23bl0; Mbr2_23n21; hfbr2_23f2: Mbr2_23B4; ; hfbrt_23nl6; hibrl_23o24; 
hfbr2_23cJ; bftr2_2a2: bfbr2_H>17; hfbr2^2bJ; bfbr2_2cl: bfbr2_2c17 : Wbrt.IcH; 
bffcrtJdIJ; Mbr2_2dl7; hfbr2_2d20; MbrtJllS; hfbr2_2hl; Mbr2_2hlO; hfbrl_2il7; 
bftr2_2kI4; htBr2_2kl9; hfbr2_3c1!; hfbr2_3fl6; hfbr2_3|S; hfbr2_3t2; hfbrtJtmlJ; 
nfbr2_62b1l: nfbr2_62nO; hfbr2_62119; hibr2_62»10; hftrt_62ol7; Ubr26«tll; 
Mbr2_64*U; Mbri_64cl6; Mbr2_6*c4; hfhr2_64hj6; bftfl_64i20; hfbr2_64jlS; 
hfbr2_64k24; hfbr2_64oI6; hibr2_6al7; hfbr2,6b24; Wbr2_6»10; hfbr2_6ol7; 
hftr2_71o20; hftr2.72bl8; taftr2_72dl3: bftri_T2J12; hftr2_72n»16; hlbr2_7Inl2; 
hftrt_TSc24; hfbf2_7ed[3; hfbrt_7Et24; nfbr2_7Sn23: hfbrj_7i24; hfhr2_7e22; 
hfbr2_7j4; Mbr2_!2c20; hlbrl JGttQ; hfbr2_ttcl7; hfbrl JQel7; Ubr2_l2e4;: 
hftrl.iew; hftr2J2»14:: bfbr1_10f 14; hftf2.82in : ; nJbrMO: hftr2_I2i24 ;: hfbrlJO: 
nfbr2_S2mI6;; hfcrllO; bfbr2_82jn6:: hftrlJO; hftd2_lj9; hlfcd2_24al3: Bftd2_24bl3: 
ofkd2_24c23: bfkd2,24n20; hfkd2_24p3; hfluO_3il3; hfkd2_3oI7; hftd2_46a6; 
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hftd2_46M0: Wkd2_«dlJ; bfkd2_46j20; blfcd2_46kl9; MM2_4««o4; hft<I2_47a4; 
MM2_4li6; Mkd2_4cl; hftdl_4kl4; bfkd2_4mll; hrocfljill: hroc.n_lc23; hnxflJelS: 
hmcn_Ul3; hhto3_1n3: taei3_H»J; te*3J4h21; hie»3_l4pl4; bw3_l4p7; 
tao3J5*13: Hin3J5c24; bietJ_15c6; tee*3J3|14; ta3JJhl; to3J5L3; 
heO_13jI!: HteJ_IJj3: tel.Ukll: tee»3_17nO; taeU,l7ll7; hte*3_l7nl2; 
teeO_l7nl8; Htei3_l8D: htdJ_HI7: n*»3J9n»; ta3_ ISjlT; braJJcl: h»J_ljl3; 
htei3_lkll; h*«3_2ft:2]; hte»3_20k2; htei3_20mll; Ina3_2]d4: hte*3_21jlJ; 
to3_IUI6: bte*3_21nZ3; beU.22c23; hte*3_22»2; btei3_22al3: lw»3_23t]l; 
hta3_23nl9; Hle»3J3nl9; h*i3_26g22: tae*3_27<ll; Iuei3.27k4; tae*3J7ol4; 
KeU_2Ml4; hte»3_I»ll: htei3_2il7; tae»3_2dl3: teo3_lt]I; h»3_2fi4: hea3_Ii,7; 
btei3 2hl; tue*3_2M5: hte*3_2119; tucOJmlS; btaiJmXK hfei3_2n9; tacO_2oO: 
htn3 30f4; H«t3J3W: bteU_33W; htei3_33e21: hfc*3_33f6; hw3_33kl6; 
hto3_J3k24; trus3_3Jttl2; btetf_35nI4; ta3_33n9; blet3_3Jpl7; b*ei3J5p22; 
hte*3_4b4; hle»3_4n7; bte*3_4fj; htei3_4h6; luet3.4ol9; htei3_JOj4; htti3_50n06; 
btet3_30n23; hta3_6b2l: te*3_6cll; htei3_6dI6; htti3.mil: HJea3_72kli; 
te»3_72pl6; htei3_7b22; bu»3_7417; tMe»3_7j3; biert_7j!; hte»3_7pl0; ta*3_7p9; 
I«ei3_fc24: Hte»3_Bfll; H*i3_B|Ji btei3_8mlO; Hterf_8p7; HteU_9*2I: Htod.9L20; 
Hua3 9k22; tutel_ITk7; hutcl_l5tl2; bmelJSil9; hutel Jli4; hutcl JS11; 
t«ae1_19fl9; hucl J9|I9; ta«l_l9(22: hulel_19tal7; nutel_H>jli; HuiBl_li2; 
twtel.20bl9; nutel_2Q|2l; butel.2Cb)3: tntel_2Ctall: bwelJOmW: butelJldlJ; 
tvutl hutel_22el2; hutel. 22n2; tutet_22o2; huicl_23eI3: hutel.Uill; 
huttl_24c19; hutel_24c1 I; butt 1.24)6: bttel_2h3; ihei ceopkneaa: ud wiuts 

22. A cocnpincr readable medium, ccenpragg in tlearoox form m lent ode 
nucleic acid or procein acquence of a clone iclectcd (rati the group coruinrnt of: 
Wbr2_16cl6; bftr2_16IIl ; Ubrt_l6(lS; MbrJJ6il2; Ubr2_l6k22; Wbrt_l«12; 
Wbf2_22f21: hlbr2_22hl3; hlbr2_22hl3; Mbr2_22»4; hfbr2_22k3; blbf2_22kS; 
hfbr2_23bl0: Ubr2_23b21; hfbrt J3C; Mbr2_23t24; ; UM23nl6: hfhr2_23o24; 
tafl>r2_23oS; Wbr2_2a2; Wbt2_Ibl7; hftr2_2hJ; hftfl_2cl; Wbf2_2el7; hfbr2_2cll: 
Mbrt_2dl3; Mbr2_2dl7; Wbr2_2d20; hfbr2_2ili; Mbr2J21i1; MhrtJhlO; Wbr2.2il7; 
kftrt_2t14; hlbr2_2tl9; hftr2_3cl<; hftrt_3n6; Whrl_3|!; hfhrtJQ; hfbr2_41ml3: 
hfbrl_62bl1; hfbr2_62n0: Mbrt.62119; hfbr2_62nl0; hfbr2_62ol7; MbrtWall; 
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hfbr2_6«.15. hfbr2_64cI6; Wbr2_64c4; Mbr2_Mh6: hlhr2_64i20; hfbr2_64jlB; 
Mbr2 64k24; hftrt_64ol6; Wbc2_6il7; hfbr2_6b24; Mbr3_6i20; kfbr2_6olT; 
hfbr2_7lo20; h(br2_72bU; bfbr2_72dl3; hfbr2_72ll2; hfbr2_72ral6; hfbrt72nll; 
tfbr2_7Sc24; hfbr2_7Sdl3; hfbr2Jtt24 : Mbr2_7an23; tuV2_7a24; bftr2Je22: 
htbrtj)*; MbrlJlcXy, hfbrl JOctt; hfbf2_S2el7: Mbrl J0el7; hfbr2_«2e4; 
hfbrl J0s4; bJbrt!2|H; bfcrl J0e.t4; hfbr2_S2il7; hfbrl JO; hfbr2Jtli24; hfbrl JO; 
bfbr2JOml6; bftrl JO; hfbrt JT2«6: hfbrl JO; c ornp fcmc n tt of the nucleic acid 
lequcTKd; and viriinti [hereof. 

23. A compulci readable medium, ccmprmog in electronic form at km$t (me 
nucleic acid or protein aojuence of a clone (elected from the (roup comuting of: 
Mbr2_l6I2l; hfbr2J&k22; hlbr2_22fil : hfnr2_22hn ; hfbr2_22i4; hfbr2_22U: hft»2_22U; 
hfbr2_23r2; ; hfbrt J3o24; hfbrI_23oJ: Wbr2_2»2; hfbrt Jcl; hftr2_2cll; hibr2_2d20; 
a/brljg 1 1; hfbr2_2h! ; nlbrtjh 1 0; nfbr2_2k 1 9; hfbr2_3n6; hfbr2_H2; hft»2_62nl0; 
llfbr2_64a] I; Mbrl_64cl6; hfbrt JMc4; hfbr2J>4h6: hfbr2_64i20; nibr2_64k24; 
hnx2,64oI6: hfbr2_6«l7; hftr2_6i2t>; afbr2_7lo20; hfbrt J2dl3; bibr2J2ml6; 
Mbr2_72nl2: bJbr2_7Idl3; bibr2_7In23; hfbr2Ja24; hfbrt Je22; hlbr2_7j4; hfbrt J2ml6; 
hfbrl JO; cu n yfcincnu of the nucleic acid a g au cncei ; and variant* thereof. 

24. A computer readable medium, coenprbinf m electronic form at lean one 
nucleic acid or protein Kquencc of i clone aclccted from the (roup comtniraj of: 
hftd2Jj9; hfkct2_24ill; nfkiQ_241)lJ: h(kd2J4«23; Wkd2_24n20; hfkd2_24p3; 
bftd2_JilJ; titan Jail; hfti2_46a6; hfkd2JW>10; WM2_46dl3; Mkd2_4«j20: 
hfkd2J«kl°; hfkd2_46n>4; hflui2J7a4; hfkd2Jb6; afluttJcJ; bfkd2_4kl4: 
hfkd2Jtal 1; c utu pleroen u of the nucleic acid aequeacet; and vartanti thereof. 

23. A computer readable medium, compriitag in electronic form at least one 
nucleic «dd or protein »Kjueoce of» clone .dcclcd from lb* group coraiaina. of: hffaOJjo-. 
Mkd2 J24e2J; hfkd2_46a6; hfluC JW»I0; bflaQ_46dl3; hfluCJW; hfkd2_4cl; 
canpfcmctiu of tbc nude* acid Kquenca: and variam (hereof. 

26. A computer readable medium, comprising in ckaronic farm at lent one 
nucleic acid or protein icquence of t clone tchxted bom the (roup comba im j of: 
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nmcf]_UM; hmcfl_lc23; hmcfl.lcIS; hmcfllglJ; complement! of the nucleic Kid 
•ccjuencei; nd vuiiot$ thereof. 

27, A computer rentable median, compriring in electronic form al lean one 
nucleic Kid or protein tequenct of i done (elected ftom the froin; oonuuing of; 
hmcfl Jc23;hmcflJtU: CccnplcmcnK of the nucleic icid pequeim; «nd vjriinu thereof. 

IS. A computer raeUt* medium, Cttnp™lni in electronic form »i lent one 
nucleic Kid or protda Kquencc of i done (elected from the {roup comimng of: 
hhte»3Jn3; hted.MgJ; hte«3_14h2l; htei3_14pl4; taei3J4p7; htei3_13ilJ; 
HteOJJcM; hua3_l3c6: nte*3_IJ|H; bteUJShl; I*s3_l3i5; wetf.lJjU: KteUJSjS; 
bte*3_I3kll; htei3_l7nO: hta3_l7llT: hteO_l7ni: ; btei3_nal»; rta3JtfJ; 
htc»3_iai7; bte*M9fl9: htef3_I9jn; bte»3_]cl; htet3_lil3; htetfjkll; htei3_20c2l: 
htctfWU: bte»3_20rall; bte»3_2]d4; htetl_21jli; tae»3_:ilI6; toe»3_2ln23; 
tae»3_22c23; bte*3_22i2; hw3_2IM3; hte»3_I3ll]; htei3_23nl9; rlteri_23nl9; 
hte*3_26f22: htrO.I7dl; hta3,I7k4i bteUJ7ol4; ta*3_lSdH; mei3_2ill: 
hteU2*17; bte»3_M15; bw3_2*l2; b*i3_2n4; h«ei3_2i7 ; hteU_2hl; b«»3_Ihl3; 
hto3_2II9; htet3_2rflli; htei3_2m20; hte*3_2«9; hle»3Jol3; r*a3 30f4: Htei3_35b4; 
htcri_3JW; hte»3_3Je2l: bte»3_3Jg6; bka3_35kl6; hteOJ5k24; nte*3_3J«12; 
hh*3_33n24; htra3_3Jo9; Hei3_35pl7: hte*3_3Jp22: htei3_4b4: htei3_4fl7: nte*3.40: 
htcU_4h6; hm3_4ol9; h«cO_50H; hle»3_JOn06; blaJ_50o23; btet3_«b2l; ht£j3 6cll: 
nteri_6dl«; tecO Tlkll; Hietf72kI3; bteU,72pl6: tatci3^7b22: hte»3_7dl7; htei3_7j3; 
h*s3_7jt; bw3.7pl0; bte*3_7p9; hteO>24; Hte*3.SfII; HtnlJiS. htcri.bnlO; 
Hw3_«p7; Hte*3_9*22; H**3_9i20; H«»3_9fc22: eompJemena of the twekfc acid 

19. A computer readable medium, comprbin* In electronic form at lean one 
nucleic acid 01 protein tequence of • done aeleeted bom the croup consbtin| of: Ma3_t«»); 
hte»3_l4pl 4; hta3_Hp7; hterij 5*13: hlei3_l J|14; htet3J Shi; teeiJJ 3j 1 1; 
hte»3_l7nO; ma3.17nll; HuaJ.llO; bte»3_l9n9; tues3.l9j1T: hte*3_20c21: 
mo3_Iln23; tale»3_22c23; hle»3_I2nl3; HleO_23nl«; nte*3_27ol4: hteU_IIdl4; 
htes3_2il l;hta3_2dlS; hteO_2n4; htej3_Ig7; htei3_2hl5;hua3_2U9; htcO_2m20: 
htca_2n9; taei3_J0f4; htei3_35j6. htaJ _35o24; busl_33pl7; hta3,4M; bte*3_4f]7; 
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tao3_4ol9; bte»J_M)4; htei3_JOn23: hteiJ_S0n06; htei3_6621: bic*)_6£U6; bte»3_72kH: 
hte*3_T<l I T; bterijjf ; H*al J|1 1 ; Hlei3_l|J; HtcjJ_lpT; Htei3_9«22; H»eri>20; 
Hua3_9t22; co ny km eao of the nucleic acid ietjuencei; and rariiru raercof. 

JO. A computet rcadabit medium, cornprainf in electronic form n lewt one 
ouckic Ktl or protein tequencc of > clone elected torn U* iroup comiiung of: 
Mbrl_16f I!; hlbr2_Ikl4; H»3_3SW; hle»3_33p22; tae*3_7p: hte*3_7pl0: 
hutel_20enl I; tonpkmcnu of the ouckic Kid (eojueacei; tad nriusi thereof, 

31. A computer readtbfc medium, cacnpriiii« in dectroaic fom at leut one 
nucleic *ckl or protein »tajucnce of i clone iclectcd from the (roup conaating of : 
hlbr2_l6cl6; M*2Jb3; htei3_lSi5; hte*3_lM7; hkU,lkll; me*J_72*13; hteU_7b2I; 
butcM9|22: hutel_24j6; compieaxno of the ouckic <cid te n uenc ea; ud viraim thereof. 

nucleic acid or protein tequencc of i clone (elected (ton the |roup contain*; of: 
hft*2 2dli: lttct3 M*2l; hutel 2h3; eompkmem of the nucleic 



Wbr2_23l24; Mbr2_Iil7: hlbr2_4lnjl3; nfcr:_62flO; hfbr2.62ll«; hftr2_64jlS; 
hlbI2_24n20: hftd224p3; bfkd2_4ki4; btci3Jil3; btetJ_2U16; hrej3_23lll; 
htet3^26t22; bu3,4bfi; taeii _T2pl6; hu»el_l«il7; hutel_2rJM3; Butel_24ell; 



ouckic Kkl or protein icqucncr of ■ ctunc ickcted from the froup eomiiiinj of: 
hftr2_3|S; hftr2 62ol7; Wbr2_fit24; Mbr2_7Sk24; Wtd2.24elS; bftd2_3ol7; 
Wbl2.46j20: bw3_1T1I7; Htei3_1Tnll: te*3_I7d1; hta3_2il7; tac*JJ5b5; 
bwJ_35kI6; bteU_33nl2; bte*3_33<r?: tautel_I0bl9; hutel_20mM; hutel_23el3 ; 
compkmenti of the nucleic acid sequencer end varum* thereof. 
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,hfbr2_13bl0: hfbrtjcll; bfbr2_64al5; hfbt2_6ol7; hfbr2_T2btl; hfbr2_T2ll2; 
hfbr3_i2i24<h(brl_10)i.htei3_14h2l; Hte»3_Dj3; ht»3_20rnlS; hte*3_22g2; htes32ml8; 
hia3_7p9; ntes3_BmlO; btnelJSl I; complements of the nucleic icid sequences; ind 
variants thereof . 

36. A computer readable medium, comprising In electronic form it tent one 
nucleic DC id or protein sequence of a clone selected from the (roup consisting of: 
hfnr2_23b2l; hfbr2_23nl6; hfbr2_2cl7; hfbr2_62bll; b(br2_7Sc24; 0 fbr2_*2e4 
<hfbrl_10e4): hfbr2_S2il7 (hfbrlJQ); hfcr2_S2m6 (Wbrl_10^hfkd2_46n>4; htes3JSkil; 
hte*3_lcl; hhteOJo3: htesj_20k2; htes3_2l<M; htes3_23nl9; htesWfJ; htes3_6cN; 
htes3_te24; hutel_20|2]; butel_22d2; hutel_22el2; complement! of the nucleic acid 
sequences; rod variants (hereof. 

37. A computer readable medium, comprising in electronic (brat tl least one 
nut he it Kid or protein sequence of a clone idected from the group consisting of: 
hfbr2_l6il2; hfbr2J«12; hfbr2_22hI3; hfbrl_2bl7; hrbr2_M17; hfbr2_64k24; 
hfbr2_S2c20 (hfbrl J0cM):.hfbr2_<2el7 (bibrl_lfcl7); hfbr2_S2|l4 (nftrl JQgl4); 
b(kd324al3; hflcd2_3il3; blkd2_4mll; hmcfljall; hmcfl_lcl3; hccs3_13c6; 
btes3_2oD; hte*J_27lc4; HesJJM; htes3_33k24; hutel_l9fl9; and nutel_24ct9; 
compl emen n of (he nucleic *cid sequence*; and variant! thereof. 

31. A computer rentable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; Mkd2_47«4 : htes3_2cl2: bte*3_21jlS; btei3_17nl2; hutel_llil9; 
huiel IL2. co m pl emen ts of the nucleic acid sequences: and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
huiel_l7k7; hutel_18cl2; hutel_ISil9; butelJM; nutel Jill; bmtM9fl9; 
hutel_l9gl9: hutel J9g22; butel J9M7; hut*l_l9jll; nutel Ji2; hutel_20bl9; 
butel J0g21; hutel_20nl3; butel_20mll; hutel 20m24; hutel_21dlJ; butel_22d2: 
tntel_22el2: hutel_22n2; hutel_22o2; hutel_23el3; hutel J3gll; butel_I4cl9; 
r«ttel_24ell;buiel_24j6; butelJhJ; complementi of the nucleic acid sequences; and 
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40. A computer readable medium, comprising in electronic form at tcut one 
nucteie mid or protein tequeaa of ■ done Kiccled from the group coniiuing of; 

hutc l_l 7k?; hute I J Ic 1 2; hule l_l Ii4; hul*l_l»gl9-, hute I _19jl I ; hute I_22n2; 
huteljldli;r^l_22o2;lrjtel_13(ll;eqmp(emenl»of the nucleic icid Kquencct; «rd 
viriwti thereof. 

41. A nucleic #cid molecule hiving the letfuence of i clone lelrjcted from the 
group comistiag of h<br2_l6cl6; hfbrl_16fll; hfbr2J6t,lS: hfbr2_16il2; hfbr2_l6k22; 
Wbr2_l6112; blbr2_22fll; Mbr2_22hl3; hfor2_22hl3; hfbr2_22i4; bfbr2_22U; 
hfbr2_22kS; hfbr2_23bl0; hfbr2_23b21; hfbr2_23fl; hfhrt_23l24; bfbr2_23nl6; 
b(br2_23024; hfbrlJJoi; hfbr2_2i2; hfbr2_2bl7; hlbr2_2bS; nfbr2_2cl; hfbr2_2cl7; 
hfbr2_2cl8; hfbr2_2dlJ; hfbrl_ldl7; hibr2_2d20; h(br2_2gll: hfbr2_2hl; hfbr2_2hl0; 
hfbr2_2il7; Mbr21M4; hfbr2_2kl9; hfbr2_3blS; hfbr2_3cll; bfbr3_3fl6: hrbr2JgS; 
lifbrtJK; hfbr2_41mlS; bfbr2_aZbll; hfbr2_«2fl0; hfbr2_62I19: hfbr2_62nl0; 
hfbr2_62ol7; hfbr2>t»ll; hfbr2_64*l5; MbrtWclo: hfbr2_64c4; Mbrt_64b6; 
hfbr2_64i20; Wbf3_64jlo: Hbr2>tk24; hfbrt_64ol6; blbr2>17; hfbr2_6hW; 
hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl!; hfbr2_72dl3; bfhrt 72112; 
hfbr2_72ml6; Mbr2,7Inl2; hlbr2_78c24; hfbr2_7Bdl3; hfbr2 78W4; hfbr2_78nI3; 
Wbr2_7«24; hfbr2_7e22: Mbr2_7j4; hfbr2_t2c20; hfbrl J0C2O; hfbr2_t2*l7; 
hfbrl_l0el7; Wbr2_B2e4;; hfbrl_I0=4; hlbr2_B2gl4;; hfbrl_l0|14; hrbr2_l2U7;; 
MbflJO. hfbr2_t2iI4;; hfbrl_10; hfbr2_S2ml6:: hfbtllO; bfbr2_l2mn;: hfbrlJO: 
hfluC_lj9; hfkd2_24al5; hfkd2_24bl5: hfkd2 24e2); kfkd2_24n20: hfk<I2J4p3; 
hfkd2Ji!3; hfkiO Jol7; hfkd2_4c*6; hfkd2_46bl0; htVd2_46dl3; hfkd2_46j20; 
hfM2_46kl9; Mkd2_46m4: hfkd2_47i4; hfkd2_4b6; bfkd2_4cS; nfkd2_4kl4; 
hfkd2_4ml]; hmcfljill; hmcfl_lc23; toncfMelJ; hmcn_lgl3; W»e»3_ln3; 
hte*3_l4gj; htei3_Uh2I; bte*3J4pl4; h*ei3J4p7; ote*3_IJil3: We»3JSc24; 
htei3JSc6; htet3_l3gl4; bte»3_iShI; bte»3_I5i5; btejJ.lSjU: Htei3_15p; htc*3_l3lfll; 
htei3_17nO; btra3J7»7; htetf_I7nl2; b»3J7nl!; Htei3_l8D; btet3JSI7; 
btes3_19fI9; hteU_I9jl7; hte*3Jcl; btei3_lgl3; hte»3Jlr.Ii; hteOMtll; bte*3_2Qk2; 
btei3_20tai]S; htet3_21d4; htci3_21Jl£ btes3_IU16; htei3_2lu23; bn3_22c23; 
hte*3_22gZ; htta3_22nl3; htt*3_23lll; htei3_23nl9; Hie*3_:3nl9; tae*3_26(22; 
het3_27dl; nte*3,27k4; hte«3_27oI4; he*3_ISdl4; htei3_2ill; bte>3_2il7; htc.3_2dlS; 
hte*3_2el2; bteUJfU; hte»3_2l7; htei3_2hl; hte*3_2hl5: btca3_2U9; hte»3_2ml8; 
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hUxiJmX); bwj_ln9; too3_2oU; hwJ_30«; HteO_35W; taei3_3JhS: hw3_35e21; 
htt*3_33|6; htn3_35kI6; hto3JJk24; hteOJJolI; hlei3_33n24; bte*3_3Jfl»; 
bt*»3_33pl7; htt»3_3Jp2I: ht«3_4b4: hwJ_4fl7: htti3j«fJ; Iaei3_4h6; htei3_4ol9: 
hte*3_3QH: htci3_50n06; du3,30d23: htr»3 6621: hw3_«cll; bto3jai6; ta3_72kll: 
Hta3jmiJ; taet3_71pl6: hlo3_7bI2: hte*3_7dl7; t*aiJJ#; taef3_7jS; I**3_7pl0; 
Wei3_7p9; hei3_Sc24; Hw3_8|l 1 1 We»3_8i5; ta»3_SreI0; Hlo3_«p7; Hle*3_9e22; 
Htn3^9i20; Hw3_9k22; hwel_l7k7; hutel lkll: Iwel Jill*. hntelJM; htnel JSJ1; 
hutel_l9fl9; hu«el_l9il9; huteM9|23; bme1_1«il7; hutcl_19jll: hu*1_lt2: 
butcl_20t>l9;huttl_20f21; huulJObU. buttlJOmll; httlel_Z0m24; huteMldlS; 
bual_22tt2; bu*l_22il2; hutel_Z2n2; huiel_22o2: btncl_13el3; butel_23(ll; 
huKl_24cl9;huiel_24c11; hu«tl_14j6; hmelJHJ; ihcir < 



42. A potypcptidi 




■ host cdl ■ pcpidc c™axJed by Ihe midctc Kid molecule accttdinc to cliim 4 1 . 
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